Robert B. Fisher 

Toby P. Breckon 

Kenneth Dawson Howe 
Andrew Fitzgibbon 

Craig Robertson 
Emanuele Trucco 
Christopher K. I. Williams 


Dic 
O. AP aT 
VISION | 
IMAGE 

PROCESSING 


Second Edition 


WILEY 


Dictionary of Computer Vision 
and Image Processing 


Dictionary 

of Computer 
Vision and Image 
Processing 


Second Edition 


R. B. Fisher 
University of Edinburgh, UK 
T. P. Breckon 
Durham University, UK 

K. Dawson-Howe 
Trinity College Dublin, Ireland 
A. Fitzgibbon 
Microsoft Research, UK 

C. Robertson 
Epipole Lid., UK 

E. Trucco 


University of Dundee, UK 


C. K. I. Williams 
University of Edinburgh, UK 


WILEY 


This edition first published 2014 
© 2014 John Wiley & Sons Ltd 


Registered office 
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United 
Kingdom 


For details of our global editorial offices, for customer services and for information about how to 
apply for permission to reuse the copyright material in this book please see our website at 
www.wiley.com. 


The right of the author to be identified as the author of this work has been asserted in 
accordance with the Copyright, Designs and Patents Act 1988. 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or 
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or 
otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the 
prior permission of the publisher. 


Wiley also publishes its books in a variety of electronic formats. Some content that appears in 
print may not be available in electronic books. 


Designations used by companies to distinguish their products are often claimed as trademarks. 
All brand names and product names used in this book are trade names, service marks, trademarks 
or registered trademarks of their respective owners. The publisher is not associated with any 
product or vendor mentioned in this book. 


Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best 
efforts in preparing this book, they make no representations or warranties with respect to the 
accuracy or completeness of the contents of this book and specifically disclaim any implied 
warranties of merchantability or fitness for a particular purpose. It is sold on the understanding 
that the publisher is not engaged in rendering professional services and neither the publisher nor 
the author shall be liable for damages arising herefrom. If professional advice or other expert 
assistance is required, the services of a competent professional should be sought. 


Library of Congress Cataloging-in-Publication Data 


Dictionary of computer vision and image processing / R. B. Fisher, T. P. Breckon, 
K. Dawson-Howe, A. Fitzgibbon, C. Robertson, E. Trucco, C. K. I. Williams. - 2nd edition. 
pages cm 
Includes bibliographical references. 
ISBN 978-1-119-94186-6 (pbk.) 
1. Computer vision-Dictionaries. 2. Image processing-Dictionaries. I. Fisher, R. B. 
TA1634.145 2014 
006.3'703-de23 
2013022869 


A catalogue record for this book is available from the British Library. 
ISBN: 9781119941866 
Set in 9/10pt Garamond by Aptara Inc., New Delhi, India 


1 2014 


From Bob to Rosemary, 
Mies, Hannah, Phoebe 
and Lars 


From Toby to Alison, my 
parents and Amy 


From Ken to Jane, 
William and Susie 


From AWF to Liz, to my 
parents, and again to D 


From Craig to Karen, 
Aidan and Caitlin 


From Manuel to Emily, 
Francesca, and Alistair 


Preface 
Numbers 


A 


B 
Cc 
D 


o x O 7 OO 4 EF MF KF HK = FT O 7 


4< aH 


Page ix 


106 
119 
127 
143 
144 
148 
162 
185 
192 
201 
225 
228 
245 
286 
299 
303 
314 


Contents 


Y 


Z 


References 


viii 


320 
321 
322 


324 


This dictionary arose out of a continu- 
ing interest in the resources needed by 
students and researchers in the fields of 
image processing, computer vision and 
machine vision (however you choose 
to define these overlapping fields). As 
instructors and mentors, we often found 
confusion about what various terms and 
concepts mean for the beginner. To sup- 
port these learners, we have tried to 
define the key concepts that a compe- 
tent generalist should know about these 
fields. 

This second edition adds approxi- 
mately 1000 new terms to the more than 
2500 terms in the original dictionary. We 
have chosen new terms that have entered 
reasonably common usage (e.g., those 
which have appeared in the index of 
influential books) and terms that were 
not included originally. We are pleased 
to welcome Toby Breckon and Chris 
Williams into the authorial team and to 
thank Andrew Fitzgibbon and Manuel 
Trucco for all their help with the first 
edition. 

One innovation in the second edition 
is the addition of reference links for a 
majority of the old and new terms. Unlike 
more traditional dictionaries, which pro- 
vide references to establish the origin 
or meaning of the word, our goal here 
was instead to provide further informa- 
tion about the term. 

Another innovation is to include a few 
videos for the electronic version of the 
dictionary. 

This is a dictionary, not an encyclo- 
pedia, so the definitions are necessarily 
brief and are not intended to replace a 
proper textbook explanation of the term. 
We have tried to capture the essentials of 
the terms, with short examples or math- 
ematical precision where feasible or nec- 
essary for clarity. 

Further information about many of the 
terms can be found in the references. 
Many of the references are to general 
textbooks, each providing a broad view 


Preface 


of a portion of the field. Some of the 
concepts are quite recent; although com- 
monly used in research publications, they 
may not yet have appeared in mainstream 
textbooks. Subsequently, this book is also 
a useful source for recent terminology 
and concepts. Some concepts are still 
missing from the dictionary, but we have 
scanned textbooks and the research liter- 
ature to find the central and commonly 
used terms. 

The dictionary was intended for begin- 
ning and intermediate students and 
researchers, but as we developed the dic- 
tionary it was clear that we also had some 
confusions and vague understandings of 
the concepts. It surprised us that some 
terms had multiple usages. To improve 
quality and coverage, each definition was 
reviewed during development by at least 
two people besides its author. We hope 
that this has caught any errors and vague- 
ness, as well as providing alternative 
meanings. Each of the co-authors is quite 
experienced in the topics covered here, 
but it was still educational to learn more 
about our field in the process of compil- 
ing the dictionary. We hope that you find 
using the dictionary equally valuable. 

To help the reader, terms appearing 
elsewhere in the dictionary are under- 
lined in the definitions. We have tried to 
be reasonably thorough about this, but 
some terms, such as 2D, 3D, light, cam- 
era, image, pixel, and color were so com- 
monly used that we decided not to cross- 
reference all of them. 

We have tried to be consistent with the 
mathematical notation: italics for scalars 
(s), arrowed italics for points and vectors 
(v), and bold for matrices M). 

The authors would like to thank Xiang 
(Lily) Li, Georgios Papadimitriou, and Aris 
Valtazanos for their help with finding cita- 
tions for the content from the first edi- 
tion. We also greatly appreciate all the 
support from the John Wiley & Sons edi- 
torial and production team! 


1D: One dimensional, usually in ref- 
erence to some structure. Examples 
include: a signal x(f) that is a function 
of time ¢; the dimensionality of a sin- 
gle property value; and one degree of 
freedom in shape variation or motion. 
[Hec87:2.1] 


1D projection: The projection of data 
from a higher dimension to a single 
dimensional representation (line). 


1-norm: A specific case of the p-norm, 
the sum of the absolute values of the 
entries of a given vector X, ||X||; = 
yo) |% |, of length n. Also known as 
the taxicab (Manhattan) norm or the 
L1 norm. [Sho07] 


2D: Two dimensional. A space describ- 
able using any pair of orthogonal basis 
vectors consisting of two elements. 
[WP:Two-dimensional_space] 


2D coordinate system: A system 
uniquely associating two real numbers 
to any point of a plane. First, two 
intersecting lines (axes) are chosen 
on the plane, usually perpendicular to 
each other. The point of intersection 
is the origin of the system. Second, 
metric units are established on each 
axis (often the same for both axes) to 


y axis 
Py }---------- 


xX axis 


T 


Numbers 


associate numbers to points. The coor- 
dinates P, and P, of a point, P, are 
obtained by projecting P onto each 
axis in a direction parallel to the other 
axis and reading the numbers at the 
intersections: [JKS95:1.4] 


2D Fourier transform: A special case 
of the general Fourier transform often 
used to find structures in images. 
[FP03:7.3.1] 


2D image: A matrix of data represent- 
ing samples taken at discrete intervals. 
The data may be from a variety of 
sources and sampled in a variety of 
ways. In computer vision applications, 
the image values are often encoded 
color or monochrome intensity sam- 
ples taken by digital cameras but may 
also be range data. Some typical inten- 
sity values are: [SQ04:4.1.1] 


06 21 11 
21 16 12 10 09 
10 09 08 09 2031 
07 06 01 02 08 42 
17 12 09 04 


Image values 


2D input device: A device for sampling 
light intensity from the real world into 
a 2D matrix of measurements. The 
most popular two-dimensional imag- 
ing device is the charge-coupled device 
(CCD) camera. Other common devices 
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are flatbed scanners and X-ray scan- 
ners. [SQ04:4.2.1] 


2D point: A point in a 2D space, 
i.e., characterized by two coordinates; 
most often, a point on a plane, e.g., 
an image point in pixel coordinates. 
Notice, however, that two coordinates 
do not necessarily imply a plane: a 
point on a 3D surface can be expressed 
either in 3D coordinates or by two 
coordinates given a surface parameter- 
ization (see surface patch). [JKS95:1.4] 


2D point feature: Localized structures 
in a 2D image, such as interest points, 
corners and line meeting points (e.g., 
X, Y and T shaped). One detector for 
these features is the SUSAN corner 
finder. [TV98:4.1] 


2D pose estimation: A special case 
of 3D pose estimation. A fundamen- 
tal open problem in computer vision 
where the correspondence between 
two sets of 2D points is found. The 
problem is defined as follows: Given 
two sets of points {X;} and {je}, find 
the Euclidean transformation {R, £} 
(the pose) and the match matrix 
{Mj} (the correspondences) that best 
relates them. A large number of tech- 
niques has been used to address this 
problem, e.g., tree-pruning methods, 
the Hough transform and geometric 
hashing. [HJL+89] 


2D projection: A transformation map- 
ping higher dimensional space onto 
two-dimensional space. The simplest 
method is to simply discard higher 
dimensional coordinates, although 
generally a viewing position is used 
and the projection is performed. 


Projected points 


Viewpoint 


= 


2D space 


For example, the main steps for a 
computer graphics projection are as 


follows: apply normalizing transform 
to 3D point world coordinates; clip 
against canonical view volume; project 
onto projection plane; transform into 
viewport in 2D device coordinates 
for display. Commonly used projection 
functions are parallel projection and 
perspective projection. [JKS95:1.4] 


2D shape descriptor (local): A com- 
pact summary representation of object 
shape over a localized region of an 
image. See shape descriptor. [Blu67] 


2D shape representation (global): A 
compact summary representation of 
image shape features over the entire 
image. See shape representation. 
[FP03:28.3] 


2D view: Planar aspect view or pla- 
nar projected view (such as an image 
under perspective projection) such 
that positions within its spatial repre- 
sentation can be indexed in two dimen- 
sions. [SB11:2.3.1] 


2.1D sketch: A lesser variant of the estab- 
lished 2.5D sketch, which captures 
the relative depth ordering of (pos- 
sibly self-occluding) scene regions in 
terms of their front-to-back relation- 
ship within the scene. By contrast, the 
2.5D sketch captures the relative scene 
depth of regions, rather than merely 
depth ordering: [NM90] 


Image dk 


2.1D Sketch 


Relative 
scene 


depth 


2.5D image: A range image obtained 
by scanning from a single viewpoint. 
It allows the data to be represented 
in a single image array, where each 
pixel value encodes the distance to 
the observed scene. The reason this 
is not called a 3D image is to make 
explicit the fact that the back sides of 
the scene objects are not represented. 
[SQ04:4.1.1] 


2.5D model: A geometric model repre- 
sentation corresponding to the 2.5D 
image representation used in the 
model to (mage) data matching 
problem of model-based recognition: 


[Mar82] An example model is: 


2.5D sketch: Central structure of Marr’s 
Theory of vision. An intermediate 
description of a scene indicating the 
visible surfaces and their arrangement 
with respect to the viewer. It is built 
from several different elements: the 
contour, texture and shading informa- 
tion coming from the primal sketch, 
stereo information and motion. The 
description is theorized to be a kind 
of buffer where partial resolution of 
the objects takes place. The name 
2.5D sketch stems from the fact that, 
although local changes in depth and 
discontinuities are well resolved, the 
absolute distance to all scene points 
may remain unknown. [FP03:11.3.2] 


3D: Three dimensional. A space describ- 
able using any triple of mutu- 
ally orthogonal basis vectors consist- 
ing of three elements. [WP:Three- 
dimensional_space] 


3D coordinate system: Same as 2D 
coordinate system but in three dimen- 
sions: [JKS95:1.4] 


+Y. 


3D data: Data described in all 
three spatial dimensions. See also 
range data, CAT and NMR. [WP: 
3D_data_acquisition_and_object_ 
reconstruction] An example of a 3D 
data set is: 


3D data acquisition: Sampling data in 
all three spatial dimensions. There is 
a variety of ways to perform this 
sampling, e.g., using structured light 
triangulation. [FP03:21.1] 


3D image: See range image. 


3D imaging: Any of a class of techniques 
that obtain three-dimensional informa- 
tion using imaging equipment. Active 
vision techniques generally include a 
source of structured light (or other 
electromagnetic or sonar radiation) 
and a sensor, such as a camera 
or a microphone. Triangulation and 
time-of-flight computations allow the 
distance from the sensor system to 
be computed. Common technologies 
include laser scanning, texture projec- 
tion systems and moiré fringe methods. 
Passive sensing in 3D depends only 
on external (and hence unstructured) 
illumination sources. Examples of such 
systems are stereo reconstruction and 
shape from focus techniques. See also 
3D surface imaging and 3D volumetric 
imaging. [FMN+91] 


3D interpretation: A 3D model, e.g., 
a solid object that explains an image 
or a set of image data. For instance, 
a certain configuration of image lines 
can be explained as the perspective 
projection of a polyhedron; in simpler 
words, the image lines are the images 
of some of the polyhedron’s lines. See 
also image interpretation. [BB82:9.1] 


3D model: A description of a 3D 
object that primarily describes its 


3 


shape. Models of this sort are regu- 
larly used as exemplars in model-based 
recognition and 3D computer graph- 
ics. [TV98:10.6] 


3D model-based tracking: An exten- 
sion of model-based tracking using 
a 3D model of the tracked object. 
[GX11:5.1.4] 


3D moments: A special case of moment 
where the data comes from a set of 3D 
points. [GC93] 


3D motion estimation: An extension 
of motion estimation whereby the 
motion is estimated as a displacement 
vector d in R°. [LRF93] 


3D motion segmentation: An exten- 
sion to motion segmentation whereby 
motion is segmented within an R? 
dataset. [TV07] 


3D object: A subset of R3. In computer 
vision, often taken to mean a volume 
in R? that is bounded by a surface. Any 
solid object around you is an example: 
table, chairs, books, cups; even your- 
self. [BB82:9.1] 


3D point: An infinitesimal volume of 3D 
space. [JKS95:1.4] 


3D point feature: A point feature on a 
3D object or in a 3D environment. For 
instance, a corner in 3D space. [RBB09] 


3D pose estimation: The process of 
determining the transformation (trans- 
lation and rotation) of an object in 
one coordinate frame with respect to 
another coordinate frame. Generally, 
only rigid objects are considered; mod- 
els of those objects exist a priori 
and we wish to determine the posi- 
tion of the object in an image on 
the basis of matched features. This 
is a fundamental open problem in 
computer vision where the correspon- 
dence between two sets of 3D points 
is found. The problem is defined as fol- 
lows: Given two sets of points {x;} and 
{Jp}, find the parameters of a Euclidean 
transformation {R, 7} (the pose) and 
the match matrix {Mg} (the correspon- 
dences) that best relates them. Assum- 
ing the points correspond, they should 
match exactly under this transforma- 
tion. [TV98:11.2] 


3D reconstruction: The recovery of 
3D scene information and organization 
into a 3D shape via e.g., multi-view 
geometry: [HZ00:Ch. 10] 


3D shape descriptor: An extension to 
regular shape descriptor approaches to 
consider object shape in R°. [Pril2: 
Ch. 17] 


3D shape representation: A com- 
pact summary representation of shape 
extending shape representation to con- 
sider object shape in R>. [Pril2:Ch. 17] 


3D SIFT: A 3D extension of the SIFT oper- 
ator defined for use over voxel data. 
[FBM10] 


3D skeleton: A 3D extension of an image 
skeleton defining a tree-like structure 
of the medial axes of a 3D object 
(akin to the form of a human stick fig- 
ure in the case of considering a per- 
son as a 3D object). See also medial 
axis skeletonization: [Sze10:12.6] See 


example below: 


y Fd 

V 

3D stratigraphy: A modeling and visu- 
alization tool used to display differ- 
ent underground layers. Often used for 
visualizations of archaeological sites or 


for detecting rock and soil structures 
in geological surveying. [PKVG00] 

3D structure 
reconstruction. 


3D SURF: A 3D extension to the SURF 
descriptor that considers the char- 
acterization of local image regions 


recovery: See 3D 


in R? via either a volumetric voxel- 
based or a surface-based representa- 
tion. [KPW+10] 


3D surface imaging: Obtaining surface 
information embedded in a 3D space. 
See also 3D imaging and 3D volumetric 


Four groups of pixels joined by 4 con- 
nectedness: 


imaging. 


3D texture: The appearance of texture 
on a 3D surface when imaged, e.g., the 
fact that the density of texels varies 
with distance because of perspective 
effects. 3D surface properties (e.g., 
shape, distances and orientation) can 
be estimated from such effects. See 
also shape from texture and texture 
orientation. [DN99] 


3D vision: A branch of computer vision 
dealing with characterizing data com- 
posed of 3D measurements. This may 
involve segmentation of the data into 
individual surfaces that are then used 
to identify the data as one of several 
models. Reverse engineering is a spe- 
cialism in 3D vision. [Dav90:16.2] 


3D volumetric imaging: Obtaining 
measurements of scene properties at 
all points in a 3D space, includ- 
ing the insides of objects. This is 
used for inspection but more com- 
monly for medical imaging. Tech- 
niques include nuclear magnetic 
resonance, computerized tomography, 
positron emission tomography and 
single photon emission computed 
tomography. See also 3D imaging and 
3D surface imaging. 


4 connectedness: A type of image 
connectedness in which each rectan- 
gular pixel is considered to be con- 
nected to the four neighboring pixels 
that share a common crack edge. See 
also 8 connectedness: [SQ04:4.5] Four 
pixels connected to a central pixel ©): 


(os) (Bs) Is) (Bs) (Bs) 


Hl Object pixel 
Background pixel 


Connected object pixels 


4D approach: An approach or solution 
to a given problem that utilizes both 
3D-spatial and temporal information. 
See 4D representation @GD-spatial + 
time). 


4D representation (3D-spatial + 
time): A 3D times series data rep- 
resentation whereby 3D scene infor- 
mation is available over a temporal 
sequence. An example would be a 
video sequence obtained from stereo 
vision or some other form of depth 
sensing: [RG08:Ch. 2] 


8 connectedness: A type of image 


connectedness in which each rectan- 
gular pixel is considered to be con- 
nected to all eight neighboring pixels. 
See also 4 connectedness: [SQ04:4.5] 
Eight pixels connected to a central 
pixel ©: 


Two groups of pixels joined by 8 con- 
nectedness: 


E Object pixel Connected object pixels 
L] Background pixel 
8 p 


8-point algorithm: An approach for the 
recovery of the fundamental matrix 
using a set of eight feature point 
correspondences for stereo camera 
calibration. [HZ00:11.2] 


A*: A search technique that performs 


best-first searching based on an evalua- 
tion function that combines the cost so 
far and the estimated cost to the goal. 
[WP:A*_search_algorithm] 


posteriori probability: Literally, 
“after” probability. It is the proba- 
bility p(s|e) that some situation s 
holds after some evidence e has been 
observed. This contrasts with the 
a priori probability, p(s), the prob- 
ability of s before any evidence is 
observed. Bayes’ rule is often used 
to compute the a posteriori proba- 
bility from the a priori probability 
and the evidence. See also posterior 
distribution. [JKS95:15.5] 


priori probability: A probability 
distribution that encodes an agent’s 
beliefs about some uncertain quan- 
tity before some evidence or data is 
taken into account. See also prior 
distribution. [Bis06:1.2.3] 7 


aberration: Problem exhibited by a 


lens or a mirror whereby unexpected 
results are obtained. Two types of aber- 
ration are commonly encountered: 
chromatic aberration, where different 
frequencies of light focus at different 
positions: 


chromatic abberation 


and spherical aberration, where light 
passing through the edges of a lens 
(or mirror) focuses at slightly different 
positions. [FP03:1.2.3] 


absolute conic: The conic in 3D 


projective space that is the intersec- 
tion of the unit (or any) sphere with 
the plane at infinity. It consists only 
of complex points. Its importance in 
computer vision is because of its role 
in the problem of autocalibration: the 
image of the absolute conic (IAC), a 
2D conic, is represented by a 3 x 3 
matrix w that is the inverse of the 
matrix KK', where K is the matrix 
of the internal parameters for camera 
calibration. Subsequently, identifying 
œ allows the camera calibration to be 
computed. [FP03:13.6] 


absolute coordinates: Generally used 


in contrast to local or relative coordi- 
nates. A coordinate system that is ref- 
erenced to some external datum. For 
example, a pixel in a satellite image 
might be at (100,200) in image coor- 
dinates, but at (51:48:05N, 8:17:54W) 
in georeferenced absolute coordinates. 
[JKS95:1.4.2] 


absolute orientation: In photogram- 


metry, the problem of registration of 
two corresponding sets of 3D points. 
Used to register a photogrammet- 
ric reconstruction to some absolute 
coordinate system. Often expressed 
as the problem of determining the 
rotation R, translation f and scale s 
that best transforms a set of model 
points {77,...,™,} to corresponding 
data points id, neds dy} by minimizing 
the least-squares error 


n 


ER, Fs) = X d — sR, + DI? 


i=1 
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to which a solution may be found 
by using singular value decomposition. 
[JKS95:1.4.2] 


absolute point: A 3D point defining the 


origin of a coordinate system. [WP: 
Cartesian_coordinate_system] 


absolute quadric: The symmetric 4 x 4 


I, 0; 
0; o0 
absolute conic, it is defined to be invari- 
ant under Euclidean transformations, is 
rescaled under similarities, takes the 
T 

form Q= (Si a) under affine 
0; 0 

transforms and becomes an arbitrary 
4 x 4 rank 3 matrix under projective 
transforms. [FP03:13.6] 


rank 3 matrix Q = ( ) . Like the 


absorption: Attenuation of light caused 


by passing through an optical system 
or being incident on an object surface. 
[Hec87:3.5] 


accumulation method: A method of 


accumulating evidence in histogram 
form, then searching for peaks, which 
correspond to hypotheses. See also 
Hough transform and generalized 
Hough transform. [Low91:9.3] 


accumulative difference: A means of 


detecting motion in image sequences. 
Each frame in the sequence is com- 
pared to a reference frame (after 
registration if necessary) to produce a 
difference image. Thresholding the dif- 
ference image gives a binary motion 
mask. A counter for each pixel loca- 
tion in the accumulative image is 
incremented every time the differ- 
ence between the reference image 
and the current image exceeds some 
threshold. Used for change detection. 
UKS95:14.1.1] 


accuracy: The error of a value away 


from the true value. Contrast this 
with precision. [WP:Accuracy_and_ 
precision] 


acoustic sonar: Sound Navigation And 


Ranging. A device that is used primar- 
ily for the detection and location of 
objects (e.g., underwater or in air, as 
in mobile robotics, or internal to a 
human body, as in medical ultrasound) 
by reflecting and intercepting 


acoustic waves. It operates with 
acoustic waves in a way analogous to 
that of radar, using both the time of 
flight and Doppler effects, giving the 
radial component of relative position 
and velocity. [WP:Sonar] 


ACRONYM: A vision system developed 
by Brooks that attempted to recognize 
three-dimensional objects from two- 
dimensional images, using generalized 
cylinder primitives to represent both 
stored model and objects extracted 
from the image. [Nev82:10.2] 


action cuboid: The 3D spatio-temporal 
space in which an action detection may 


be localized in a video sequence: 


Analogous to a window (or region of 
interest) in which an object detection 
may be localized within a 2D image. 
[GX11:6.4] 


action detection: An approach to the 
automated detection of a given human, 
vehicle or animal activity (action) from 
imagery. Most commonly carried out 
as a video analysis task due to the tem- 
poral nature of actions. [Sze10:12.6.4] 


action localization: An approach 
to in-image or in-scene positional 
localization of a given human, vehicle 
or animal activity. See also action 
detection. [Sze10:12.6.4] 


action model: A pre-defined or learned 
model of a given human action which 
is matched against a given unseen 
action instance to perform action 
recognition or action detection. Akin 
to the use of models in model-based 


object recognition. [NWF08] 


action recognition: Similar to action 
detection but further considering the 


classification of actions (e.g., walk- 
ing, running, kicking, lifting, stretch- 
ing). See also activity recognition and 
behavior classification, of which action 


recognition is often a sub-task, i.e., an 
activity or behavior is considered as a 
series of actions: 


Commonly the terms action, activity 
and behavior are used inter-changeably 
in the literature. [Sze10:12.6.4] 


action representation: A model-based 
approach whereby an action is rep- 
resented as a spatio-temporal feature 
vector over a given video sequence. 
[GX11:Ch. 6] 


action unit: The smallest atom or mea- 
surement of action within an action 
sequence or action representation 
removed from the raw measurement 
of pixel movement itself (e.g., optical 
flow). [LJ11:18.2.2] 


active appearance model: A generaliza- 
tion of the widely used active shape 
model approach that includes all of 
the information in the image region 
covered by the target object, rather 
than just that near modeled edges. 
The active appearance model has a sta- 
tistical model of the shape and gray- 
level appearance of the object of inter- 
est. This statistical model generalizes 
to cover most valid examples. Match- 
ing to an image involves finding model 
parameters that minimize the differ- 
ence between the image and a synthe- 
sized model example, projected into 
the image. [NA05:6.5] 


active blob: A region-based approach 
to the tracking of non-rigid motion 
in which an active shape model is 
used. The model is based on an initial 
region that is divided using Delaunay 


triangulation and then each patch is 
tracked from frame to frame (note that 
the patches can deform). [SI98] 


active calibration: An approach to 


camera calibration that uses naturally 
occurring features within the scene 
with active motion of the camera to 
perform calibration. By contrast, tradi- 
tional approaches assume a static cam- 
era and a predefined calibration object 
with fixed features. [Bas95] 


active contour model: A technique 


used in model-based vision where 
object boundaries are detected using a 
deformable curve representation such 
as a snake. The term “active” refers to 
the ability of the snake to deform shape 
to better match the image data. See also 
active shape model. [SQ04:8.5] 


active contour tracking: A technique 


used in model-based vision for tracking 
object boundaries in a video sequence 
using active contour models. [LL93] 


active illumination: A system of light- 


ing where intensity, orientation or pat- 
tern may be continuously controlled 
and altered. This kind of system may 
be used to generate structured light. 
[CS09:1.2] 


active learning: A machine-learning 


approach in which the learning agent 
can actively query the environment for 
data examples. For example, a classifi- 
cation approach may recognize that it 
is less reliable over a certain sub-region 
of the input example space and thus 
request more training examples that 
characterize inputs for that sub-region. 
Considered to be a supervised learning 
approach. [Bar12:13.1.5] 


active net: An active shape model that 


parameterizes a triangulated mesh. 
[TY89] 


active recognition: An approach 


to object recognition or scene 
classification in which the recognition 
agent or algorithm collects further 
evidence samples (e.g., more images 
after moving) until a sufficient level 
of confidence is obtained to make a 
decision on identification. See also 
active learning. [RSB04] 


active sensing: 1) A sensing activity car- 
ried out in an active or purposive way, 
e.g., where a camera is moved in space 
to acquire multiple or optimal views 
of an object (see also active vision, 
purposive vision and sensor planning). 
2) A sensing activity implying the pro- 
jection of a pattern of energy, e.g., a 
laser line, onto the scene (see also laser 
stripe triangulation and structured 
light triangulation). [FPO3:21.1] 


active shape model: Statistical model 
of the shape of an object that can 
deform to fit a new example of the 
object. The shapes are constrained by 
a statistical shape model so that they 
may vary only in ways seen in a train- 
ing set. The models are usually formed 
using principal component analysis to 
identify the dominant modes of shape 
variation in observed examples of the 
shape. Model shapes are formed by 
linear combinations of the dominant 
modes. [WP:Active_shape_model] 


active stereo: An alternative approach to 
traditional binocular stereo. One of the 
cameras is replaced with a structured 
light projector, which projects light 
onto the object of interest. If the 
camera calibration is known, the 
triangulation for computing the 3D 
coordinates of object points simply 
involves finding the intersection of a 
ray and known structures in the light 
field. [CSO9:1.2] 


active structure from X: The recovery 
of scene depth (i.e., 3D structure) via 
an active sensing technique, such as 


shape from X techniques plus motion. 


ah 


[Sze10:12.2] The figure shows the shape 


the light plane being swept along the 
object: 


active surface: 1) A surface determined 
using a range sensor; 2) an active shape 
model that deforms to fit a surface. 
[WP:Active_surface] 


active triangulation: Determination 
of surface depth by triangulation 
between a light source at a known 
position and a camera that observes 
the effects of the illuminant on the 
scene. Light stripe ranging is one form 
of active triangulation. A variant is to 
use a single scanning laser beam to 
illuminate the scene and a stereo pair 
of cameras to compute depth. [WP: 
3D_scanner#Triangulation] 


active vision: An approach to computer 
vision in which the camera or sensor 
is moved in a controlled manner, so 
as to simplify the nature of a problem. 
For example, rotating a camera with 
constant angular velocity while main- 
taining fixation at a point allows abso- 
lute calculation of scene point depth, 
instead of relative depth that depends 
on the camera speed. See also kinetic 
depth. [Nal93:10] 


active volume: The volume of inter- 
est in a machine vision application. 
[SZH+10:Ch. 1] 


activity: A temporal sequence of actions 
performed by an entity (e.g., a person, 
animal or vehicle) indicative of a given 
task, behavior or intended goal. See 
activity classification. [Sze10:12.6.4] 


activity analysis: Analyzing the behav- 
ior of people or objects in a video 
sequence, for the purpose of identi- 
fying the immediate actions occurring 
or the long-term sequence of actions, 
e.g., detecting potential intruders in 
a restricted area. [WP:Occupational_ 
therapy#Activity_analysis] 


activity classification: The 
classification of a given temporal 
sequence of actions forming a given 
activity into a discrete set of labels. 
[Sze10:12.6.4] 


activity graph: A graph encoding the 
activity transition matrix where each 


from structured light method, with 
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node in the graph corresponds to an 


activity (or stage of an activity) and 
the arcs among nodes represent the 
allowable next activities (or stages): 
[GX11:7.3.2] 


activity model: A representation of 
a given activity used for activity 
classification via an approach akin to 
that of model-based object recognition. 


activity recognition: See 
classification. 


activity 


activity representation: See activity 
model. 


activity segmentation: The task of 
segmenting a video sequence into a 
series of sub-sequences based on vari- 
ations in activity performed along that 
sequence. [GX11:7.2] 


activity transition matrix: An NxN 
matrix, for a set of N different activi- 
ties, where each entry corresponds to 
the transition probability between two 
states and each state is itself an activity 
being performed within the scene. 
See also state transition probability: 
[GX11:7.2] 


Activity Walk Browse Meet Fight 
Walk 0.9 0.84 0.63 0.4 
Browse 0.3 0.78 0.73 0.2 
Meet 0.74 0.79 0.68 0.28 
Fight 0.32 0.45 0.23 0.60 


acuity: The ability of a vision system 
to discriminate (or resolve) between 
closely arranged visual stimuli. This 
can be measure using a grating, i.e., 


a pattern of parallel black and white 
stripes of equal widths. Once the 
bars become too close, the grating 
becomes indistinguishable from a uni- 
form image of the same average inten- 
sity as the bars. Under optimal light- 
ing, the minimum spacing that a per- 
son can resolve is 0.5 min of arc. 
[(Umb98:7.6] 


AdaBoost: An Adaptive Boosting 
approach for ensemble learning 
whereby the (weak) classifiers are 
trained in sequence such that the nth 
classifier is trained over a training set 
re-weighted to give greater emphasis 
to training examples upon which the 
previous (n — 1) classifiers performed 
poorly. See boosting. [Bis06:14.3] 


adaptation: See adaptive. 


adaptive: The property of an algorithm 
to adjust its parameters to the data 
at hand in order to optimize perfor- 
mance. Examples include adaptive 
contrast enhancement, adaptive 
filtering and adaptive smoothing. [WP: 
Adaptive_algorithm] 


adaptive behavior model: A behavior 
model exhibiting adaptive properties 
that facilitate the online updating of the 
model used for behavior analysis. Gen- 
erally follows a three-stage process: 
model initialization, online anomalous 
behavior detection and online model 
updating via unsupervised learning. 
See also unsupervised behavior 
modeling. [GX11:8.3] 


adaptive bilateral filter: A variant 
on bilateral filtering used as an 
image-sharpening operator with simul- 
taneous noise removal. Performs image 
sharpening by increasing the overall 
“slope” (.e., the gradient range) of 
the edges without producing over- 
shoot or undershoot associated with 
the unsharp operator. [ZA08] 


adaptive coding: A scheme for the 
transmission of signals over unreliable 
channels, e.g., wireless links. Adap- 
tive coding varies the parameters of 
the encoding to respond to changes 
in the channel, e.g., “fading”, where 
the signal-to-noise ratio degrades. [WP: 
Adaptive_coding] 
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adaptive contrast enhancement: 
An image processing operation that 
applies histogram equalization 
locally across an image. [WP: 


Adaptive_histogram_equalization] 


adaptive edge detection: Edge 
detection with adaptive thresholding 
of the gradient magnitude image. 
[Nal93:3.1.2] 


adaptive filtering: In signal process- 
ing, any filtering process in which the 
parameters of the filter change over 
time or where the parameters are dif- 
ferent at different parts of the signal or 
image. [WP:Adaptive_filter] 


adaptive histogram equalization: A 
localized method of improving image 
contrast. A histogram is constructed 
of the gray levels present. These gray 
levels are re-mapped so that the his- 
togram is approximately flat. It can be 
made perfectly flat by dithering: [WP: 
Adaptive_histogram_equalization] 


el OF Sos 


original after adaptive 


histogram equalization 


adaptive Hough transform: A Hough 
transform method that iteratively 
increases the resolution of the param- 
eter space quantization. It is partic- 
ularly useful for dealing with high- 
dimensional parameter spaces. Its dis- 
advantage is that sharp peaks in the 
histogram can be missed. [NA05:5.6] 


adaptive meshing: Methods for creat- 
ing simplified meshes where elements 
are made smaller in regions of high 
detail (rapid changes in surface orien- 
tation) and larger in regions of low 
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detail, such as planes. [WP:Adaptive_ 
mesh_refinement] 


adaptive pyramid: A method of multi- 
scale processing where small areas of 
image having some feature in com- 
mon (e.g., color) are first extracted into 
a graph representation. This graph is 
then manipulated, e.g., by pruning or 
merging, until the level of desired scale 
is reached. [JM92] 


adaptive reconstruction: Data-driven 
methods for creating statistically signif- 
icant data in areas of a 3D data cloud 
where data may be missing because of 
sampling problems. [YGK95] 


adaptive smoothing: An iterative 
smoothing algorithm that avoids 
smoothing over edges. Given an image 
I(x, y), one iteration of adaptive 
smoothing proceeds as follows: 
1. Compute gradient magnitude image 

Gx, V) = |VI@, V|. 

2. Make weights image W(x, y) = 


ery) 

3. Smooth the image: S@,y)= 
Ei Dj- Awy 
Xiz Xj- Bxyij 
where 
Axyij = Ie+iyt+ DWA +i, 
VTP) 


Byyiy = WŒ +i y+ j) 
[WP:Additive_smoothing] 


adaptive thresholding: An improved 
image thresholding technique where 
the threshold value varies at each pixel. 
A common technique is to use the aver- 
age intensity in a neighbourhood to set 
the threshold: [Dav90:4.4] 


Image, | 


Smoothed, S Thresholded 
| > S-6 


adaptive triangulation: See adaptive 
meshing. 


adaptive visual servoing: See visual 
servoing. [WP:Visual_Servoing] 


adaptive weighting: A scheme for 
weighting elements in a summation, 
voting or other formulation such that 
the relative influence of each ele- 
ment is representative (i.e., adapted to 
some underlying structure). For exam- 
ple this may be the similarity of pix- 
els within a neighborhood (e.g., an 
adaptive bilateral filter) or a property 
changing over time. See also adaptive. 
[YK06] 


additive color: The way in which mul- 
tiple wavelengths of light can be com- 
bined to allow other colors to be per- 
ceived (e.g., if equal amounts of green 
and red light are shone onto a sheet 
of white paper, the paper will appear 
to be illuminated with a yellow light 
source (see below). Contrast this with 
subtractive color: [Gal90:3.7] 


additive noise: Generally, image- 
independent noise that is added to 
it by some external process. The 
recorded image 7 at pixel Œ, j) is then 
the sum of the true signal $ and the 
noise N. 


Lgj = Sig + Nig 


The noise added at each pixel (i, j) 
could be different. [Umb98:3.2] 


adjacency: See adjacent. 


adjacency graph: A graph that shows 
the adjacency between structures, 
such as segmented image regions. The 
nodes of the graph are the structures 
and an arc implies adjacency of the 
two structures connected by the arc. 
The figure shows the graph associated 
with the segmented image on the left: 
[AFF85] 


Regions Adjacency graph 


adjacent: Commonly meaning “next to 
each other”, whether in a physical 
sense of pixel connectivity in an 
image, image regions sharing some 
common boundary, nodes in a graph 
connected by an arc or components 
in a geometric model sharing some 
common bounding component. For- 
mally defining “adjacent” can be some- 
what heuristic because you may need 
a way to specify closeness (e.g., on 
a quantized grid of pixels) or to con- 
sider how much shared “boundary” 
is required before two structures are 
adjacent. [Nev82:2.1.1] 


affective body gesture: See affective 
gesture. 


affective gesture: A gesture made by 
the body (human or animal) which 
is indicative of emotional feeling or 
response. Used in gesture analysis to 
indicate social interaction. [GX11:5.4] 


affective state: The emotional state on 
an entity (human or animal) relat- 
ing to emotional feeling or response. 
Often measured via gesture analysis or 
facial expression analysis. See affective 
gesture. 


affine: A term first used by Euler. 
Affine geometry is a study of proper- 
ties of geometric objects that remain 
invariant under affine transformations 
(mappings), including parallelness, 
cross ratio and adjacency. [WP:Affine_ 
geometry] 


affine arc length: For a parametric equa- 
tion of a curve fD) = (x, yD), arc 
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length is not preserved under an affine 
transformation. The affine length 


oe i Ga — pp} 
0 


is invariant under affine transforma- 
tions. [SQ04:8.4] 


affine camera: A special case of the pro- 
jective camera that is obtained by con- 
straining the 3 x 4 camera parameter 
matrix T such that T3 ı = 73,2 = 73,3 = 
0 and reducing the camera parameter 
vector from 11 degrees of freedom to 
8. [FP03:2.3.1] 


affine curvature: A measure of curva- 
ture based on the affine arc length, t. 
For a parametric equation of a curve 
fD = xa), VAD), its affine curva- 
ture, 4, is 


UCT) = XOY" T — x" y"@) 
[WP:Affine_curvature] 


affine flow: A method of finding the 
movement of a surface patch by 
estimating the affine transformation 
parameters required to transform the 
patch from its position in one view to 
another. [Cal05] 


affine fundamental matrix: The 
fundamental matrix which is obtained 
from a pair of cameras under affine 
viewing conditions. It isa 3 x 3 matrix 
whose upper left 2 x 2 submatrix is all 
zero. [HZ00:13.2.1] 


affine invariant: An object or shape 
property that is not changed by (ie., 
is invariant under) the application of 
an affine transformation. [FP03:18.4.1] 


affine length: See affine arc length. [WP: 
Affine_curvature] 


affine moment: Four shape measures 
derived from second and third order 
moments that remain invariant under 
affine transformations. They are given 
by the following equations, where 
each u is the associated central 
moment: [NA05:7.3] 


_ 20Ko2 — Hi, 


D 
4 
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affine quadrifocal tensor: The form 
taken by the quadrifocal tensor when 
specialized to the viewing condi- 
tions modeled by the affine camera. 
[HTM99] 


affine reconstruction: A three- 
dimensional reconstruction where 
the ambiguity in the choice of basis 
is affine only. Planes that are parallel 
in the Euclidean basis are parallel in 
the affine reconstruction. A projective 
reconstruction can be upgraded to an 
affine reconstruction by identification 
of the plane at infinity, often by 
locating the absolute conic in the 
reconstruction. [HZ00:9.4.1] 


affine registration: The registration 
of two or more images, surface 
meshes or point clouds using an affine 
transformation. [JV05] 


affine stereo: A method of scene recon- 
struction using two calibrated views of 
a scene from known viewpoints. It is 
a simple but very robust approxima- 
tion to the geometry of stereo vision, 
to estimate positions, shapes and sur- 
face orientations. It can be calibrated 
very easily by observing just four ref- 
erence points. Any two views of the 
same planar surface will be related by 
an affine transformation that maps one 
image to the other. This consists of a 
translation and a tensor, known as the 
disparity gradient tensor, representing 
the distortion in image shape. If the 
standard unit vectors X and Y in one 


image are the projections of some vec- 
tors on the object surface and the lin- 
ear mapping between images is repre- 
sented by a 2 x 3 matrix A, then the 
first two columns of A will be the cor- 
responding vectors in the other image. 
Since the centroid of the plane will 
map to both image centroids, it can 
be used to find the surface orientation. 
[Qua93] 


affine transformation: A special set of 

transformations in Euclidean geometry 

that preserve some properties of the 
construct being transformed: 

e Points remain collinear: if three 
points belong to the same straight 
line, their images under affine trans- 
formations also belong to the same 
line and the middle point remains 
between the other two points. 

e Parallel lines remain parallel and 
concurrent lines remain concurrent 
(images of intersecting lines inter- 
sect). 

e The ratio of lengths of the segments 
of a given line remains constant. 

e The ratio of areas of two triangles 
remains constant. 

e Ellipses remain ellipses; parabolas 
remain parabolas and hyperbolas 
remain hyperbolas. 

e Barycenters of triangles (and other 
shapes) map into the corresponding 
barycenters. 


aa 


Analytically, affine transformations are 
represented in the matrix form 


f(x) =Ax+b 


where the determinant det(A) of the 
square matrix A is not 0. In 2D, the 
matrix is 2 x 2; in 3D it is 3 ~x 3. 
[FP03:2.2] 


affine trifocal tensor: The form taken 
by the trifocal tensor when specialized 
to the viewing conditions modeled by 
the affine camera. [HTM99] 


affinely invariant region: Image 
patches that automatically deform with 
changing viewpoint in such a way that 
they cover identical physical parts of a 
scene. Since such regions are describ- 
able by a set of invariant features they 
are relatively easy to match between 
views under changing illumination. 
[TGOO] 


affinity matrix: A matrix capturing the 
similarity of two entities or their rela- 
tive attraction in a force- or flow-based 
model. Often referred to in graph 
cut formulations. See affinity metric. 
[Sze10:5.4] 


affinity metric: A measurement of 
the similarity between two entities 
(e.g., features, nodes or images). See 
similarity metric. 


affordance and action recognition: 
An affordance is an opportunity for 
an entity to take an action. The 
recognition of such occurrences thus 
identifies such action opportunities. 
See action recognition. [Gib86] 


age progression: Refers to work consid- 
ering the change in visual appearance 
because of the human aging process. 
Generally considered in tasks such as 
face recognition, face detection and 
face modeling. Recent work considers 
artificial aging of a sample facial image 
to produce an aged interpretation of 
the same: [GZSM07] 


agglomerative clustering: A class of 
iterative clustering algorithms that 
begin with a large number of clus- 
ters and, at each iteration, merge 
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pairs (or tuples) of clusters. Stop- 
ping the process at a certain num- 
ber of iterations gives the final set 
of clusters. The process can be run 
until only one cluster remains and 
the progress of the algorithm can be 
represented as a dendrogram. [WP: 
Hierarchical_clustering] 


AIC: See Akaike Information Criterion 
(AIC). 


Akaike Information Criterion 
(AIC): A method for statistical 
model selection where the best-fit 
log-likelihood is penalized by the num- 
ber of adjustable parameters in the 
model, so as to counter over-fitting. 
Compare with Bayesian information 
criterion. [Bis06:1.3] 


albedo: Whiteness. Originally a term 
used in astronomy to describe reflect- 
ing power. 


e 


If a body reflects 50% of the light falling 
on it, it is said to have albedo of 0.5. 
[FP03:4.3.3] 


algebraic curve: A simple parame- 
terized curve representation using 
Euclidean geometry for an object that 
cannot be represented by linear prop- 
erties (see figure). Parameterized in 
R” Euclidean space, in the form of 
{x : FŒ = 0}: [Gib98:Ch. 1] 


Albedo values 


O00 


0.75 0.5 0.25 0.0 


Algebraic curve 
of unit circle: 


x24 y?-1=0 


algebraic distance: A linear distance 
metric commonly used in computer 
vision applications because of its sim- 
ple form and standard matrix-based 
least mean square estimation opera- 
tions. If a curve or surface is defined 
implicitly by f(X, a = 0 (e.g., X- å = 
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0 for a hyperplane), the algebraic dis- 
tance of a point x to the surface is 
simply f(x;, a). [FP03:10.1.5] 


algebraic point set surfaces: A smooth 
surface model defined from a point 
cloud representation using localized 
moving least squares fitting of an 
algebraic surface (namely an algebraic 
sphere). [GG07] 


algebraic surface: A parameterized 
surface representation using Euclidean 
geometry defined in R” Euclidean 
space. Regular 3D surfaces such as 
planes, spheres, tori and generalized 
cylinders occur in R?, in the form of 
{x : F@ = 0}. [Zar71] 


aliasing: The erroneous replacement of 
high spatial frequency CHF) compo- 
nents by low-frequency ones when 
a signal is sampled. The affected HF 
components are those that are higher 
than the Nyquist frequency, or half the 
sampling frequency. Examples include 
the slowing of periodic signals by 
strobe lighting and corruption of areas 
of detail in image resizing. If the 
source signal has no HF components, 
the effects of aliasing are avoided, so 
the low-pass filtering of a signal to 
remove HF components prior to sam- 
pling is one form of anti-aliasing. Con- 
sider the perspective projection of a 
checkerboard. The image is obtained 
by sampling the scene at a set of inte- 
ger locations. The spatial frequency 
increases as the plane recedes, produc- 
ing aliasing artifacts Gagged lines in the 
foreground, moiré patterns in the back- 
ground): 


Removing high-frequency components 
(i.e., smoothing) before downsampling 
mitigates the effect: [FP03:7.4] 


alignment: An approach to geometric 
model matching by registration of a 
geometric model to the image data. 
[FP03:18.2] 


ALVINN: Autonomous Land Vehicle In 


a Neural Network; an early attempt, 
at Carnegie Mellon University, to 
learn a complex behavior (maneuver- 
ing a vehicle) by observing humans. 
[Pom8s9] 


ambient light: Illumination by diffuse 


reflections from all surfaces within 
a scene (including the sky, which 
acts as an external distant surface). In 
other words, light that comes from 
all directions, such as the sky on a 
cloudy day. Ambient light ensures that 
all surfaces are illuminated, including 
those not directly facing light sources. 
[FP03:5.3.3] 


ambient space: Refers to the dimen- 


sional space surrounding a given math- 
ematical object in general terms. For 
example, a line can be studied in isola- 
tion or within a 2D space - in which 
case, the ambient space is a plane. Sim- 
ilarly a sphere may be studied in 3D 
ambient space. This is of particular rel- 
evance if the ambient space is nonlin- 
ear or skewed (e.g., a magnetic field). 
[SMCO5] 


AMBLER: An autonomous active vision 


system using both structured light 
and sonar, developed by NASA and 
Carnegie Mellon University. It is sup- 
ported by a 12-legged robot and is 
intended for planetary exploration. 
[BHK+89] 


amplifier noise: Spurious additive noise 


signal generated by the electronics in a 
sampling device. The standard model 
for this type of noise is Gaussian. It 
is independent of the signal. In color 
cameras, where more amplification is 
used in the blue color channel than in 
the green or red channels, there tends 
to be more noise in the blue channel. 
In well-designed electronics, amplifier 
noise is generally negligible. [WP: 
Image_noise#Amplifier_noise_ 

.28 Gaussian_noise.29] 


analog/mixed analog—digital image 


processing: The processing of images 
as analog signals (e.g., by optical 
image processing) prior to or with- 
out any form of image digitization. 
Largely superseded by digital image 
processing. [RK82] 


analytic curve finding: A method of 


detecting parametric curves by trans- 
forming data into a feature space that 
is then searched for the hypothesized 
curve parameters. An example is line 
finding using the Hough transform. 
[XOK90] 


anamorphic lens: A lens having one 


or more cylindrical surfaces. Anamor- 
phic lenses are used in photography to 
produce images that are compressed 
in one dimension. Images can later be 
restored to true form using a reversing 
anamorphic lens set. This form of lens 
is used in wide-screen movie photog- 
raphy. [WP:Anamorphic_lens] 


anatomical map: A biological model 


usable for alignment with, or region 
labeling of, a corresponding image 
dataset. For example, one could use 
a model of the brain’s functional 
regions to assist in the identification 
of brain structures in an NMR dataset. 
[GHC+00] = 
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AND operator: A Boolean logic oper- 
ator that combines two input binary 
images: 


p&q 
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This approach is used to select image 
regions by applying the AND logic at 
each pair of corresponding pixels. The 
rightmost image below is the result 
of ANDing the two leftmost images: 
(SB11:3.2.2] 


IR 


angiography: A method for imaging 
blood vessels by introducing a dye that 
is opaque when photographed by X- 
ray. Also the study of images obtained 
in this way. [WP:Angiography] 


angularity ratio: Given two figures, X 
and Y, a;(X) and £;(Y) are angles sub- 
tending convex parts of the contour of 
the figure X and y,CX) are angles sub- 
tending plane parts of the contour of 
figure X; the angularity ratios are: 


> aX) 
7 360° 
and 
di BiCO 
ee VeRO 
[Lee64] 
anisotropic diffusion: An edge- 


over the image or signal being filtered. 
[WP:Anisotropic_filtering] 


anisotropic structure tensor: A matrix 
representing non-uniform, second- 
order (i.e., gradient/edge) information 
of an image or function neighborhood. 
Commonly used in corner detection 
and anisotropic filtering approaches. 
[Sze10:3.3] 


annotation: A general term referring to 
the labeling of imagery either with 
regard to manual ground truth labeling 
or automatic image labeling of the out- 
put of a scene understanding, semantic 
scene segmentation or augmented 
reality approach: [$ze10:14.6, 13.1.2] 


anomalous behavior detection: Spe- 
cial case of surveillance where human 
movement is analyzed. Used in par- 
ticular to detect intruders or behav- 
ior likely to precede or indicate crime. 
[WP:Anomaly_detection] 


anomaly detection: The automated 
detection of an unexpected event, 
behavior or object within a given 
environment based on comparison 
with a model of what is normally 
expected within the same. Often con- 
sidered as an unsupervised learning 
task and commonly applied in visual 
industrial inspection and automated 


preserving smoothing filter commonly 
used for noise removal. Also called 
Perona-Malik diffusion. See also 
bilateral filtering. [Sze10:3.3] 


anisotropic filtering: Any filtering tech- 
nique where the filter parameters vary 
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visual surveillance. [Bar12:13.1.3] 


antimode: The minimum between two 
maxima. One method of threshold 
selection is done by determining the 
antimode in a bimodal histogram. [WP: 
Bimodal_distribution#Terminology] 


f(x) Antimode 


' 


=X 


aperture: Opening in the lens diaphragm 
of a camera through which light is 
admitted. This device is often arranged 
so that the amount of light can be 
controlled accurately. A small aperture 
reduces the amount of light available, 
but increases the depth of field. The 
figure shows nearly closed (left) and 
nearly open (right) aperture positions: 
[TV98:2.2.2] 


aa 


closed open 


aperture control: Mechanism for vary- 
ing the size of a camera’s aperture. 
[WP:Aperture#Aperture_control] 


aperture problem: If a motion sen- 
sor has a finite receptive field, it per- 
ceives the world through something 
resembling an aperture, making the 
motion of a homogeneous contour 
seem locally ambiguous. Within that 
aperture, different physical motions 
are therefore indistinguishable. For 
example, the two motions of the 
square below are identical in the cir- 
cled receptive fields: [Nal93:8.1.1] 


BEFORE 


AFTER 


appearance-based 


apparent contour: The apparent con- 


tour of a surface S in 3D is the set of 
critical values of the projection of $ 
onto a plane, in other words, the sil- 
houette. If the surface is transparent, 
the apparent contour can be decom- 
posed into a collection of closed curves 
with double points and cusps. The con- 
vex envelope of an apparent contour 
is also the boundary of its convex hull. 
[Nal93:Ch. 4] 


apparent motion: The 3D motion sug- 


gested by the image motion field, but 
not necessarily matching the real 3D 
motion. The reason for this mismatch 
is that motion fields may be ambigu- 
ous; that is, they may be generated by 
different 3D motions or light source 
movement. Mathematically, there may 
be multiple solutions to the problem 
of reconstructing 3D motion from the 
image motion field. See also visual 
illusion and motion estimation. [WP: 
Apparent_motion] 


appearance: The way an object 


looks from a particular viewpoint 
under particular lighting conditions. 
[FP03:25.1.3] 


recognition: 
Object recognition where the 
object model encodes the possible 
appearances of the object (as con- 
trasted with a geometric model that 
encodes the shape, in model-based 
recognition). In principle, it is impos- 
sible to encode all appearances when 
occlusions are considered; however, 
small numbers of appearances can 
often be adequate, especially if there 
are not many models in the model 
base. There are many approaches 
to appearance-based recognition, 
such as using a principal component 
model to encode all appearances 
in a compressed framework, using 
color histograms to summarize the 
appearance or a set of local appear- 
ance descriptors such as Gabor filters 
extracted at interest points. A com- 
mon feature of these approaches is 
learning the models from examples. 
[TV98:10.4] 


appearance-based tracking: Methods 


for object or target recognition in real 
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time, based on image pixel values in 
each frame rather than derived fea- 
tures. Temporal filtering, such as the 
Kalman filter, is often used. [BJ98] 


appearance change: Changes in an 
image that are not easily accounted for 
by motion, such as an object actually 
changing form. [BFY98] 


appearance enhancement trans- 
form: Generic term for operations 
applied to images to change, or 
enhance, some aspect of them, such as 
brightness adjustment, contrast adjust- 
ment, edge sharpening, histogram 
equalization, saturation adjustment or 
magnification. [Hum77] 


appearance feature: An object or scene 
feature relating to visual appearance, 
as opposed to features derived from 
shape, motion or behavior analysis. 
[HFRO6] 


appearance flow: Robust methods for 
real-time object recognition from a 
sequence of images depicting a moving 
object. Changes in the images are used 
rather than the images themselves. It is 
analogous to processing using optical 
flow. [DTSO6] 


appearance model: A representation 
used for interpreting images that is 
based on the appearance of the object. 
These models are usually learned by 
using multiple views of the objects. 
See also active appearance model and 
appearance-based recognition. [WP: 
Active_appearance_model] 


appearance prediction: Part of the 
science of appearance engineering, 
where an object texture is changed 
so that the viewer experience is pre- 
dictable. [Kan97] 


appearance singularity: An image posi- 
tion where a small change in viewer 
position can cause a dramatic change 
in the appearance of the observed 
scene, such as the appearance or disap- 
pearance of image features. This is con- 
trasted with changes occurring when 
in a generic viewpoint. For example, 
when viewing the corner of a cube 
from a distance, a small change in view- 
point still leaves the three surfaces at 
the corner visible. However, when the 
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viewpoint moves into the infinite plane 
containing one of the cube faces (a sin- 
gularity), one or more of the planes dis- 
appears. [MR98] 


arc length: If f is a function such 
that its derivative f’ is continuous on 
some closed interval [a, b] then the arc 
length of f from x = a to x = bis the 
integral: [FP03:19.1] 


b 
f 1+ [f/@oPrdx 


arc length 


x=a x=b x 


arc of graph: Two nodes in a graph 
can be connected by an arc (also 
called an edge). The edges can be 
either directed or undirected. The 
dashed lines here are the arcs: [WP: 
Graph_(mathematics)] 


architectural model reconstruction: 
A generic term for reverse engineering 
buildings based on collected 3D data as 
well as libraries of building constraints. 
[WZ02] 


area: The measure of a region or sur- 
face’s extension in some given units. 
The units could be image units, such 
as square pixels, or scene units, such 
as square centimeters. [JKS95:2.2.1] 


area-based: Operation applied to a 
region of an image; as opposed to pixel- 
based. [CS09:6.6] 


ARMA: See autoregressive moving 
average model. 


array processor: A group of time- 
synchronized processing elements that 
perform computations on data dis- 
tributed across them. Some array pro- 
cessors have elements that communi- 
cate only with their immediate neigh- 
bors, as in the topology shown below. 
See also single instruction multiple 
data. [WP:Vector_processor] 


© (J © 
© e e 

arterial tree segmentation: Generic 
term for methods used in finding 
internal pipe-like structures in med- 
ical images, such as NMR images, 
angiograms and X-rays. Example trees 


include bronchial systems and veins. 
[BWB+05] 


articulated object: An object composed 
of a number of (usually) rigid subparts 
or components connected by joints, 
which can be arranged in a number 
of different configurations. The human 
body is a typical example. [BM02:1.9] 


articulated object model: A represen- 
tation of an articulated object that 
includes its separate parts and their 
range of movement (typically joint 
angles) relative to each other. [RK95] 


articulated object segmentation: 
Methods for acquiring an articulated 
object from 2D or 3D data. [YP06] 


articulated object tracking: Tracking 


an articulated object in an image 
sequence. This includes both the pose 
of the object and also its shape 
parameters, such as joint angles. [WP: 
Finger_tracking] 


aspect graph: A graph of the set of views 


(aspects) of an object, where the arcs 
of the graph are transitions between 
two neighboring views (the nodes) and 
a change between aspects is called 
a visual event. See also characteristic 
view. This graph shows some of the 
aspects of a hippopotamus: [FP03:20] 


— aspects 


aspect ratio: 1) The ratio of the sides of 


the bounding box of an object, where 
the orientation of the box is chosen to 
maximize this ratio. Since this measure 
is scale invariant, it is a useful metric 
for object recognition. 

2) In a camera, the ratio of the horizon- 
tal to vertical pixel sizes. 

3) In an image, the ratio of the image 
width to height - an image of 640 by 
480 pixels has an aspect ratio of 4:3. 
[Low91:2.2] 


aspect: See characteristic view and 


aspect graph. 


asperity scattering: A light scattering 


effect, common to the modeling or 
photography of human skin, caused 
by sparsely distributed point scatters 
over the surface. In the case of human 
skin, these point scatters are vellus 
(short, fine, light-colored and barely 
noticeable) hairs present on the sur- 
face. [INNO7:3.3] 


association graph: A graph used in 


structure matching, such as matching a 
geometric model to a data description. 
In this graph, each node corresponds 
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to a pairing between a model and a data 
feature (with the implicit assumption 
that they are compatible). Arcs in the 
graph mean that the two connected 
nodes are pairwise compatible. Find- 
ing maximal cliques is one technique 
for finding good matches. The graph 
below shows a set of pairings of model 
features A, B and C with image features 
a, b, c and d. The maximal clique con- 
sisting of A:a, B:b and C:c is one match 
hypothesis: [BB82:11.2.1] 


astigmatism: A refractive error with 
where the light is focused within an 
optical system. It occurs when a lens 
has irregular curvature causing light 
rays to focus at an area, rather than at 
a point: 


It may be corrected with a toric 
lens, which has a greater refractive 
index on one axis than the others. In 
human eyes, astigmatism often occurs 
with nearsightedness and farsighted- 
ness. [FP03:1.2.3] 


asymmetric SVM: A variant on tra- 
ditional support vector machine 
classification where the false positives 
are modeled in the training objective 
of finding a maximal margin of classifi- 
cation. An asymmetric SVM maximizes 
the margin between the negative class 
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and the core (ie., high confidence 
subset) of the positive class by intro- 
ducing a secondary core margin in 
addition to the traditional inter-class 
margin. They are jointly optimized 
within the training optimization cycle. 
[WLCCO8] 


atlas-based segmentation: A segmen- 
tation technique used in medical 
image processing, especially with 
brain images. Automatic tissue segmen- 
tation is achieved using a model of the 
brain structure and imagery (see atlas 
registration) compiled with the assis- 
tance of human experts. See also image 
segmentation. [VYCL03] 


atlas registration: An image registra- 
tion technique used in medical image 
processing, especially to register brain 
images. An atlas is a model (perhaps sta- 
tistical) of the characteristics of multi- 
ple brains, providing examples of nor- 
mal and pathological structures. This 
makes it possible to take into account 
anomalies that single-image registra- 
tion could not. See also medical image 
registration. [HSS+08] 


atomic action: In gesture analysis and 
action recognition, a short sequence 
of basic limb movements that form the 
pattern of movement associated with a 
higher level action. For example, “lift 
right leg in front of left leg” for a run- 
ning action or “swing right hand” and 
“rotate upper body” for taking a bad- 
minton shot. [GX11:Ch. 1] 


ATR: See automatic target recognition. 
[WP:Automatic_target_recognition] 


attached shadow: A shadow caused by 
an object on itself by self-occlusion. See 
also cast shadow. [FP03:5.3.1] 


attention: See visual attention. [WP: 
Attention] 


attenuation: The reduction of a partic- 
ular phenomenon, e.g., noise attenua- 
tion is the reduction of image noise. 
[WP: Attenuation] 


attributed graph: A graph useful for 
representing different properties of an 
image. Its nodes are attributed pairs of 
image segments, their color or shape 
e.g. The relations between them, such 


as relative texture or brightness are 
encoded as arcs. [BM02:4.5.2] 


atypical co-occurrence: An unusual 
joint occurrence of two or more events 
or observations against a priori expec- 
tation. [GX11:Ch. 9] 


augmented reality: Primarily a projec- 
tion method that adds, e.g., graphics or 
sound as an overlay to original image or 
audio. For example, a fire-fighter’s hel- 
met display could show exit routes reg- 
istered to his or her view of the build- 
ing. [WP:Augmented_reality] 


autocalibration: The recovery of a 
camera’s calibration using only point 
(or other feature) correspondences 
from multiple uncalibrated images 
and geometric consistency constraints 
(e.g., that the camera settings are the 
same for all images in a sequence). 
[Low91:13.7] 


autocorrelation: The extent to which 


a signal is similar to shifted copies of 
itself. For an infinitely long 1D signal 
{@®:Re R, the autocorrelation at a 
shift At is 


RA) = f SOf + Ade 


The autocorrelation function Rp 
always has a maximum at 0. A 
peaked autocorrelation function 
decays quickly away from At = 0. 
The sample autocorrelation func- 
tion of a finite set of values fin is 
{r7;@|d = 1,...,2— 1} where 


E- Aina = P 
Ladi- fr 


and f = + $; fi is the sample mean. 
[WP:Autocorrelation] 


r; = 


autofocus: Automatic determination and 


control of image sharpness in an opti- 
cal or vision system. There are two 
major variations in this control system: 
active focusing and passive focusing. 
Active autofocus is performed using 
a sonar or infrared signal to deter- 
mine the object distance. Passive aut- 
ofocus is performed by analyzing the 
image itself to optimize differences 
between adjacent pixels in the CCD 
array. [WP:Autofocus] 


automated visual surveillance: The 
generalized use of automatic scene 


understanding approaches com- 
monly including object _detection, 
tracking and, more recently, behavior 
classification: [Dav90:Ch. 22] 


automatic: Performed by a machine 
without human intervention. 
The opposite of “manual”. [WP: 
Automation] 


automatic target recognition (ATR): 
The detection of hostile objects in a 
scene using sensors and algorithms. 
Sensors are of many different types, 
sampling in infrared and visible light 
and using acoustic sonar and radar. 
[WP:Automatic_target_recognition] 


autonomous vehicle: A mobile robot 
controlled by computer, with human 
input operating only at a very high 
level, e.g., stating the ultimate des- 
tination or task. Autonomous naviga- 
tion requires the visual tasks of route 
detection, self-localization, landmark 
location and obstacle detection, as well 
as robotic tasks such as route planning 
and motor control. [WP:Autonomous_ 
car] 


autoregressive model: A model that 
uses statistical properties of the past 
behavior of some variable to predict 
future behavior of that variable. For 
example, a signal x, at time £ satis- 
fies an autoregressive model of order 
p if x=} a,%-1+;, where 
œw, is noise. The model could also 
be nonlinear. [WP:Autoregressive_ 
model] 


autoregressive moving average 
(ARMA) model: Combines an 
autoregressive model with a moving 
average for the statistical analysis and 
future value prediction of time series 
information. [BJ71] 


autostereogram: An image similar to 
a random dot stereogram in which 
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the corresponding features are com- 
bined into a single image. Stereo fusion 
allows the perception of a 3D shape in 
the 2D image. [WP:Autostereogram] 


average smoothing: See 
smoothing. 


mean 


AVI: Microsoft format for audio and 
video files (‘audio video interleaved”). 
Unlike MPEG, it is not a standard, so 
that compatibility of AVI video files and 
AVI players is not always guaranteed. 
[WP:Audio_Video_Interleave] 


axial representation: A region 
representation that uses a curve to 
describe the image region. The axis 
may be a skeleton derived from the 
region by a thinning process. [RM91] 


axiomatic computer vision: 
Approaches relating to the core 
principles (or axioms) of computer 
vision. Commonly associated with the 
interpretation of Marr’s theory for the 
basic building blocks of how visual 
interpretation scene understanding 
should ideally be performed. Most 
recently associated with low-level 
feature detection (e.g., edge detection 
and corner detection). [KZM05] 


axis of elongation: 1) The line that 
minimizes the second moment of 
the data points. If {x;} are the data 
points, and d(x, L) is the distance from 
point x to line L, then the axis of 
elongation A minimizes )°,d(@,, A)’. 
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Let ù be the mean of {x,}. Define 
the scatter matrix S = X}; &; — (OG — 
W". Then the axis of elongation 
is the eigenvector of S with the 
largest eigenvalue. See also principal 
component analysis. The figure shows 
a possible axis of elongation for a set 
of points: 


2) The longer midline of the bounding 
box with largest length-to-width ratio. 
JKS95:2.2.3] 


axis of rotation: A line about which a 
rotation is performed. Equivalently, the 
line whose points are fixed under the 
action of a rotation. Given a 3D rotation 
matrix R, the axis is the eigenvector of 
R corresponding to the eigenvalue 1. 
[JKS95:12.2.2] 


axis—angle curve representation: A 
rotation representation based on the 
amount of twist 6 about the axis 
of rotation, here a unit vector 
a. The quaternion rotation repre- 
sentation is similar. [WP:Axis-angle_ 
representation] 


B-rep: See 


surface boundary 


representation. 


b-spline: A curve approximation spline 


represented as a combination of basis 
functions: 

m 

D=) ABO 

i=0 
where B; are the basis functions and 
a; are the control points. B-splines do 
not necessarily pass through any of the 
control points; however, if b-splines 
are calculated for adjacent sets of con- 
trol points the curve segments will join 
up and produce a continuous curve. 
[JKS95:13.7.1] 


b-spline fitting: Fitting a b-spline to a 
set of data points. This is useful for 
noise reduction or for producing a 
more compact model of the observed 
curve. [JKS95:13.7.1] 


b-spline snake: A snake made from 


b-splines. [BHU00] 


back projection: 1) A form of display 


where a translucent screen is illumi- 
nated from the side facing away from 
the viewer. 

2) The computation of a 3D quan- 
tity from its 2D projection. For exam- 
ple, a 2D homogeneous point x is the 
projection of a 3D point X by a per- 
spective projection matrix P, so X = 
PX. The backprojection of X is the 3D 
line {null(@P) + AP*+x} where Pt is the 
pseudoinverse of P. 

3) Sometimes used interchangeably 
with triangulation. 

4) Technique to compute the attenua- 
tion coefficients from intensity profiles 


covering a total cross section under 
various angles. It is used in CT and 
MRI to recover 3D from essentially 2D 
images. 

5) Projection of the estimated 3D posi- 
tion of a shape back into the 2D image 
from which the shape’s pose was esti- 
mated. [Jai89:10.3] 


background: In computer vision, gener- 


ally used in the context of object recog- 
nition. The background is either the 
area of the scene behind the objects 
of interest or the part of the image 
whose pixels sample from the back- 
ground in the scene. As opposed to 
foreground. See also figure-ground 
separation. [JKS95:2.5] 


background labeling: Methods for dif- 


ferentiating objects in the foreground 
of an image or objects of interest from 
those in the background. [Low91:10.4] 


background modeling: Segmentation 


or change detection method where the 
scene behind the objects of interest is 
modeled as a fixed or slowly changing 
background, with possible foreground 
occlusions. Each pixel is modeled as 
a distribution which is then used to 
decide if a given observation belongs 
to the background or an occluding 
object. [NA05:3.5.2] 


background normalization: Removal 


of the background by some image- 
processing technique to estimate the 
background image and then dividing 
or subtracting the background from 
an original image. The technique is 
useful for non-uniform backgrounds. 
[JKS95:3.2.1] The following figures 
show the input image: 


Dictionary of Computer Vision and Image Processing, Second Edition. 
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Sonnet for Le 


the background estimate obtained by 
the dilate operator with ball, 9) 
structuring element: 


and the (normalized) division of the 
input image by the background image: 


Sonnet for Lena 


Alas! First when I tried to use VQ 
1 found that your cheeks belong to omly you 
Your silky hair cootaime a thowsami linea 

Hart to match with snema of discrete coniare 
And for your lips, sensual and tactual 

Thirteen Crays found not the proper fractal 
And while theme sethacks are all quite severe 

1 might bave fixed them with harks bere on there 
But when filters tank spasile from ymis eyes 
Denil, Dama oll thin, TI jost cligitine” 


Thomas Coltharst 
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background subtraction: The separa- 


tion of image foreground components 
achieved by subtracting pixel values 
belonging to the image background 
obtained by a background modeling 
technique: [MC11:3.5.1] 


Foreground Objects 


back lighting: A method of illuminat- 
ing a scene where the background 
receives more illumination than the 
foreground. Commonly this is used 
to produce a silhouette of an opaque 
object against a lit background, for eas- 
ier object detection. [Gal90:2.1.1] 


back-propagation: One of the best- 
studied neural network training algo- 
rithms for supervised learning. The 
name arises from using the propaga- 
tion of the discrepancies between the 
computed and desired responses at 
the network output back to the net- 
work inputs. The discrepancies are 
one of the inputs into the network 
weight recomputation process. [WP: 
Backpropagation] 


back-tracking: A basic technique for 
graph searching: if a terminal but non- 
solution node is reached, the search 
does not terminate with failure, but 
continues with still unexplored chil- 
dren of a previously visited non- 
terminal node. Classic back-tracking 
algorithms are breadth-first, depth- 
first, and A*. See also graph, graph 
searching, search tree. [BB82:11.3.2] 


bag of detectors: An object detection 
approach driven by ensemble learning 
using a set of independently trained 
detection concepts (detectors), pos- 
sibly of the same type (eg., random 
forest). Results from the set are 
combined using bagging or boosting 
to produce detection results. [HBSO9] 


bag of features: A generalized feature 
representation approach whereby a set 


bagging: An 


of high-dimensional feature descrip- 
tors (e.g., SIFT or SURF) are encoded 
via quantization to a set of identi- 
fied unordered code-words in the same 
dimensional space. This quantization 
set is denoted as the codebook (or 
dictionary) containing visual words or 
visual codewords. The frequency of 
occurrence of these quantized features 
within a given sample image, repre- 
sented as a histogram, is commonly 
used as an input to a classifier in 
generalized object recognition, object 
detection and scene classification 
approaches. See object recognition for 
additional terminology. [Sze10:14.4.1] 


bag of words: See bag of features. 


ensemble learning 
approach where the overall result 
of the ensemble set is formed as 
either a majority vote, in the case 
of classification, or a mean value, in 
regression cases. [Bis06:14.2] 


balanced filter: A phrase in general 


usage to refer to a filtering approach 
that gives equivalent weights to sam- 
ples spatially, spectrally or tempo- 
rally in any direction (e.g., Gaussian 
smoothing, mean filtering or low-pass 
filtering). By contrast, an unbalanced 
filter, such as bilateral filtering, could 
be considered an adaptive approach. 


bandpass filter: A signal processing fil- 
tering technique that allows signals 
between two specified frequencies to 
pass but cuts out signals at all other 
frequencies. [FP03:9.2.2] 


bar: A raw primal sketch primitive that 
represents a dark line segment against 
a lighter background (or its inverse). 


Receptive field 


Bars are also one of the primitives in 
Marr’s theory of vision. The follow- 
ing small dark bar is observed inside 
a receptive field: [MH80] 


bar-code reading: Methods and algo- 


rithms used for the detection, imag- 
ing and interpretation of parallel black 
lines of different widths, arranged 
to give details on products or other 
objects. Bar codes themselves have 
many different coding standards and 
arrangements. An example bar code is: 
[Gal90:Ch. 7] 


GLUT 


bar detector: 1) Method or algorithm 


that produces maximum excitation 
when a bar is in its receptive field; 2) 
device used by thirsty undergraduates. 
[WP:Feature_detection_(nervous_ 
system)#History] 


barrel distortion: Geometric lens 


distortion in an optical system that 
causes the outlines of an object 
to curve outward, forming a barrel 
shape. See also pincushion distortion. 
(Hec87:6.3.1] 


Barycentric coordinates: A scheme 


for position-invariant geometry, where 
point positions are expressed as a 
weighted sum ofa set of control points. 
For example, in a triangular mesh, 
points on the surface of the mesh 
can be expressed as a weighted sum 
of the containing triangle’s vertices. 
[FP03:12.1.1]. Here, the Barycentric 
coordinates of point X are (a, b,c) 
because X = aA + bB + cC: 


barycentrum: See center of mass. 


bas-relief ambiguity: The ambigu- 


ity in reconstructing a 3D object 
with Lambertian surface reflectance 
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using shading from an image under 
orthographic projection. If the true 
surface is z(x, y), then the family of sur- 
faces az(x, y) + bx + cy generate iden- 
tical images under these viewing condi- 
tions, so any reconstruction for any val- 
ues of a, b, c is equally valid. The ambi- 
guity is thus up to a three-parameter 
family. [BKY99] 


baseline: Distance between two cam- 
eras in a binocular stereo system: 
[DH73:10.6] 


Object point 


Epipolar plane 


Left image plane Right image plane 


Leftcamera Stereo baseline Right camera 


basis function representation: A 
method of representing a function as 
a sum of simple (usually orthonormal) 
functions. For example, the Fourier 
transform represents functions as a 
weighted sum of sines and cosines. 
(Jai89:1.2] 


batch learning: A machine learning 
training approach whereby all of the 
training set is processed together as 
a batch set before any update to 
the learning concept is made. This 
contrasts with incremental learning 
or online learning, where an update 
is made after individually process- 
ing each instance of the training set. 
Batch vs. incremental learning is com- 
monly a trade-off in back-propagation 
training for neural networks and 
gradient descent approaches in gen- 
eral. [BisO6:2.5.4] 


Bayer pattern: A distribution pattern of 
cells on a CCD digital camera for sens- 
ing RGB color such that half the cells 
sense green light whilst the remaining 
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cells are equally split between sens- 
ing red or blue light. This is because 
of the luminance component of the 
image being largely determined by the 
sensed green light values the human 
visual system is more sensitive to high- 
frequency detail. A color image is cre- 
ated via demosaicing of the Bayer pat- 
tern: [Sze10:2.3] 


Incident 
light 


Sensor cell array 
(Bayer pattern) 


Bayes’ rule: The relationship between 


the conditional probability of event A 
given B and the conditional probabil- 
ity of event B given event A. This is 
expressed as 


P(B\A)P(A) 
P(B) 


providing that 
[SQ04:14.2.1] 


P(A|B) = 


P(B) £ 0. 


Bayes’ theorem: See Bayes’ rule. 


Bayesian adaptation: An adaptive train- 


ing approach for Bayesian learning 
whereby a Bayesian classifier that has 
been pre-trained on a large ubiqui- 
tous training set is adapted to a spe- 
cific deployment or usage environment 
by adjustment (re-weighting) using a 
smaller adaptation set drawn from that 
environment. [Tos11:8.3.2] 


Bayesian classifier: A mathematical 


approach to classifying a datapoint by 
selecting the class most likely to have 
generated it. If x is the data and c is 


a class, then the posterior probability 
of that class is p(c|x). This probabil- 
ity can be computed using Bayes’ rule, 


Bayesian learning: A machine learning 
approach based on conditional 
probability following the principle of 


which says that p(c|X) = paoro, Sub- 
sequently, we can compute p(c|X) in 
terms of the probability p(@<|c) of hav- 
ing observed the given data x under 
class c, p(c) the prior probability of 
class c, and p(x), which is computed as 
p@ =. ple) pc’) summing over 
all classes. The Bayesian classifier is 
the most common statistical classifier 
currently used in computer vision pro- 
cesses. Note that the Bayesian classifier 
is so called not because it is a Bayesian 
statistical model, but because of the 
use of Bayes’ rule. [DH73:2.2] 


Bayesian data association: The use of 
probability following Bayes’ rule to 
tackle the data association problem 
of multiple target tracking based on 
association by maximum a posteriori 
probability given the image features 
present. [GX11:5.1.3] 


Bayesian filtering: A probabilistic data 
fusion technique. It uses a formulation 
of probabilities to represent the sys- 
tem state and likelihood functions to 
represent their relationships. In this 
form, Bayes’ rule can be applied and 
further related probabilities deduced. 
[WP:Bayesian_filtering] 


Bayesian graph: A graph model 
used to illustrate a joint probability 
distribution. Nodes in the graph rep- 
resent variables (states) and edges 
represent the influence of one vari- 
able (denoted states) upon another. A 
generalization of a Bayesian network. 
[MYA95] 


Bayesian information criterion: 
A model selection approach, fol- 
lowing Bayes’ rule, based on the 
likelihood of a given model, with a 
given parameterization, considering 
the evidence of the data, the prior 
probability of model occurrence 
for a given parameterization and 


a model complexity penalization 
term (following the principle 
of Occam’s razor). Related to 


Akaike Information Criterion (AIC). 
[Bis06:4.4.1] 


Bayes’ rule. For example, see Bayesian 
classifier. [Bar12:Ch. 9] 


Bayesian model learning: See 
probabilistic model learning. 


Bayesian model selection: See 
Bayesian information criterion. 


Bayesian network: Another name for 
a directed graphical model. [WP: 
Bayesian_network] 


Bayesian Occam’s razor: An extension 
to the principle of Occam’s razor, fol- 
lowing Bayes’ rule, stating that the 
model (or hypothesis) with the great- 
est likelihood given the data will dom- 
inate the posterior probability, hence 
it is the best model choice. See 
also Bayesian information criterion. 
[Mac92] 


Bayesian parameter inference: A 
conditional probability parameter 
estimation approach following the 
principle of Bayes’ rule. Also known as 
Bayesian parameter estimation. [KS88] 


Bayesian saliency: A saliency map gen- 
eration approach, within an image 
or another space, where saliency is 
defined by the likelihood of a given 
observation (e.g., value) in relation to 
all the others in the set. [GX11:Ch. 9] 


Bayesian statistical model: Consider a 
statistical model M with a parameter 
vector 0. A common approach to 
statistical modeling is to estimate 
the parameters of the model given 
data D, e.g., by maximum likelihood 
estimation. The Bayesian approach 
instead uses probability to quantify 
the uncertainty about 6. We start 
with a prior distribution p(0|M). 
This is combined with the likelihood 
function p(D|0, M) to yield the 


posterior distribution p(6|D, M), 
F 0M) pO 
ie., DOLD, M) = Penne 


The normalizer p(D|M) is called 
the marginal likelihood. In words 
this equation can be written as 
posterior = pasaar. The model 
can be used to make predictions by 
averaging over the posterior; e.g., for 
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a supervised learning task with input x 
and output y, we have p(y|x, D, M) = 
J px, 0, M)pO@|D, M) db. 
[Bis06:1.2.3] 


Bayesian temporal model: An exten- 
sion of the Bayesian model to the tem- 
poral domain to consider tightly corre- 
lated temporal data, such as in facial 
expression analysis where the next 
state is predicted at time ¢ given the 
state at time ¢ — 1 together with the 
observation at time t. [GX11:4.2.2] 


BDRF: See bidirectional 
distribution function. 


reflectance 


beam splitter: An optical system that 
divides unpolarized light into two 
orthogonally polarized beams, at 90° 
to each other, as in this figure: 
[Hec87:4.3.4] 


incoming 
beam 


behavior affinity matrix: An affinity 
matrix capturing the similarity of 
behaviors for behavior classification 
and similar tasks. 


behavior analysis: Model based 
vision techniques for identifying 
and tracking behavior in humans. 
Often used for threat analysis. [WP: 
Applied_behavior_analysis] 


behavior class distribution: 1) The 
global prior distribution of multiple 
classes within a behavior classification 
task. 
2) The within class distribution of a 
given class when used in a behavior 
classification task. See also within-class 
scatter matrix and distribution overlap. 


behavior classification: The 
classification of a given activity 
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sequence to one of a discrete set of 
labels corresponding to an a priori 
behavior model. [WM00] 


behavior correlation: The use of 
techniques generally derived from 
cross-correlation matching to the 
spatio-temporal analysis of video 
sequences for behavior analysis and 
behavior detection. [SI05] 


behavior detection: The detection of 
specific behavior patterns within video 
sequences. See also behavior analysis 
and anomalous behavior detection. 
[GX11:Ch. 8] 


behavior footprint: A temporal 
statistical distribution of behavior 
patterns occurring at a given pixel or 
region of interest within an image. 


Records behavior occurrence at that 
point in the image. Commonly repre- 
sented as a histogram: [GX11:10.1] 


pa -4 7234567 891011 


(behavior type) 


behavior hierarchy: A hierarchical 
behavior representation consisting of 
three layers: atomic actions (i.e., basic 
limb movement sequences); actions as 
a sequence of atomic actions (e.g., run- 
ning and walking); and activities (or 
events) composed of a sequence of 
actions over space and time (e.g., a per- 
son walking to the car to fetch their 
bag). [GX11:1.1] 


behavior interpretation: A general 
term referring to the extraction of high- 
level semantic descriptions of behavior 
patterns occurring in video sequences. 
See also behavior analysis and general 
image interpretation. [GX11:Ch. 3] 


behavior learning: Generation of goal- 
driven behavior models by some 
learning algorithm, e.g., reinforcement 
learning. [SB98:1.1] 


behavior localization: The localization 
of a given behavior occurrence or 
behavior pattern within a single image 
(i.e., object localization), within a 
scene context (i.e., 3D localization) or 


within a video sequence (i.e., temporal 
localization). The figure shows tempo- 
ral localization: [GX11:6.4] 


walking across 
road to car park 


behavior model: A pre-defined or 
learned model of a given behav- 
ior used for the tasks of behavior 
detection, behavior classification and 
general behavior analysis. Akin to the 
use of models in model-based object 
recognition, whereby the model is 
matched against a given unseen behav- 
ior instance to aid semantic under- 
standing. [GX11:Ch. 11] 


behavior pattern: A temporal sequence 
of actions (or events) forming an 
activity that conforms to a given 
human (or animal) behavior. Gen- 
erally represented as a temporal 
feature vector. The tasks of behavior 
analysis, behavior detection and 
behavior classification thus form a 
specialization of generalized pattern 
recognition within this context. See 
also behavior hierarchy and behavior 
model. [GX11:8.1.1] 


behavior posterior: The posterior 
probability, P(Z,|e,), at time ¢ of the 
occurrence of a given behavior, Z;, in 
a behavior pattern of events, e;. See 
also behavior hierarchy. [GX11:9.1] 


behavior prediction: The prediction 
of a given behavior pattern, gener- 
ally at a given location in a multi- 
camera system, based on current 
knowledge of behavior patterns 
until the current point in time. In 
addition, for the multi-camera case, 
implicit (or explicit) knowledge of the 
multi-camera topology. See also multi- 
camera behavior correlation. 
[GX11:Ch. 15] 


behavior profile: A semantic label- 
ing of a given behavior model to 
correspond to an a priori behavior 


pattern. Commonly used in anomalous 
behavior detection. Often used synony- 
mously with behavior model. [GX11: 
Ch. 8] 


behavior profiling: The use of a 
behavior profile in a given behavior 
analysis task. [GX11:Ch. 8] 


behavior representation: See behavior 
model 


behavioral context: 1) The context in 
which a computer vision system oper- 
ates with regards to the actions of 
agents in its environment. 
2) Contextual information used to aid 
in behavior interpretation. [GX11:Ch. 
2] 


behavioral saliency: The consideration 
of the notion of salience with regard to 
behavior analysis. The identification of 
behavior patterns which deviate from 
that considered normal. See anomalous 
behavior detection. [BI05] 


belief network: Another name for a 
directed graphical model or Bayesian 
network. [Mur12:6.2] 


belief propagation: Assume a joint 
probability distribution over a set 
of random variables defined by a 
probabilistic graphical model. With 
knowledge of the values of some 
set of variables e (the evidence), it 
is desired to carry out probabilistic 
inference to obtain the conditional 
distribution for (a subset of) the 
remaining variables x, i.e., p@œļ|e). 
In belief propagation, the posterior 
marginal distribution of each variable 
in x is obtained via a message-passing 
process. The algorithm is exact for 
graphs without loops; more generally 
the junction tree algorithm can be 
used to compute posterior marginals 
for any probabilistic graphical model. 
The forward-backward algorithm is 
a special case of belief propagation 
applied to the hidden Markov model 
(HMM). [Bis06:8.4.4] 


Beltrami flow: A noise suppression 
technique where images are treated as 
surfaces and the surface area is min- 
imized in such a way as to preserve 
edges. See also diffusion smoothing. 
[SKM98] 
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bending energy: 1) A metaphor bor- 
rowed from the mechanics of thin 
metal plates. If a set of landmarks is 
distributed on two infinite flat metal 
plates and the differences in the coor- 
dinates between the two sets are ver- 
tical displacements of the plate, one 
Cartesian coordinate at a time, then the 
bending energy is the energy required 
to bend the metal plate so that the land- 
marks are coincident. When applied to 
images, the sets of landmarks may be 
sets of features. 
2) Denotes the amount of energy that 
is stored because of an object’s shape. 
[DLSZ91] 


Bernoulli distribution: The distribu- 
tion of a binary discrete random 
variable which takes on value 1 with 
probability p and value 0 with proba- 
bility 1 — p. [Bis06:2.1] 


best next view: See next view planning. 


between-class scatter matrix: Given 
some vector data grouped into a num- 
ber of classes, with m; being the mean 
of class i, and m being the over- 
all mean of the data, the between- 
class scatter matrix is defined as Sz = 
>, mn, — man; — m)", where n; is 
the number of data points in class 
i. It is used in the computation of 
the Fisher linear discriminant (FLD). 
See also within-class scatter matrix. 
[Bis06:4.1] 


Bhattacharyya distance: A measure of 
the (dis)similarity of two probability 
distributions. Given two arbitrary dis- 
tributions p;@);=1,2 the Bhattacharyya 
distance between them is: [PS06:4.5] 


a= -10g f y Pı@pP:X).dxX 


bias field estimation: A process used in 
magnetic resonance imaging (MRI) to 
estimate the intensity inhomogeneities 
present in the measured imagery 
because of properties of the imag- 
ing device itself. Bias is corrected via 
filtering prior to viewing or image 
analysis. [LXAGO9] 


BIC: See Bayesian information criterion. 


bicubic spline interpolation: A special 
case of surface interpolation that uses 
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cubic spline functions in two dimen- 
sions. This is like bilinear surface 
interpolation except that the interpo- 
lating surface is curved, instead of flat. 
[WP:Bicubic_interpolation#Bicubic_ 
convolution_algorithm] 


bidirectional reflectance distribution 
function (BRDF): If the energy arriv- 
ing at a surface patch, denoted 
E@;,¢;), and the energy radiated 
in a particular direction is denoted 
L(@ec, Qe) in polar coordinates, then 
BRDF is defined as the ratio of the 
energy radiated from a patch of a sur- 
face in some direction to the amount 
of energy arriving there. The radiance 
is determined from the irradiance by 


LO, Pe) = SG: Qi, Oe, PE Oe, Pe) 


where the function f is the bidi- 
rectional reflectance distribution func- 
tion. This function often only depends 
on the difference between the incident 
angle ¢; of the ray falling on the surface 
and the angle ¢, of the reflected ray: 
[FP03:4.2.2] 


bidirectional texture function (BTF): 
A general solution for texture modeling 
in computer graphics (notably in 
texture synthesis) for textures that 
change appearance with illumination 
and viewing angle. Modeled in a sim- 
ilar way to a bidirectional reflectance 


distribution function (BRDF) such that 
the texture output is indexed by inci- 
dent illumination and viewing angle. 
Unlike the BRDF where the response 
value is simply reflectance, response 
values of the BIF are whole image 
patches. [TFCS11:8.4.2] 


bilateral filtering: A non-iterative alter- 
native to anisotropic filtering where 


images can be subject to smoothing but 
edges present in the images are pre- 
served. [WP:Bilateral_filter] 


bilateral 
filtering. 


smoothing: See bilateral 


bilateral symmetry: A constrained 
property of symmetry such that a given 
object (or 2D shape) is divisible into 
symmetrical halves on either side of a 
single unique line or plane. A common 
property of many biological organisms, 
mammals and plants: [Tyl96] 


Axes of bilateral symmetry 


bilinear surface interpolation: To 
determine the value of a function 
SG, y) at an arbitrary location (x, V), 
of which only discrete samples fi; = 
(SE V-11 are available. The sam- 
ples are arranged on a 2D grid, so 
the value at point (x, y) is interpo- 
lated from the values at the four sur- 
rounding points. In the diagram below, 


fi bilinear (X, y) = 


A+B 
(di + Aa + dp) 


where 
A= ddr fa + day fr 
B= dd fiz + did fro 


The gray lines offer an easy aide mem- 
oire: each function value fj is mul- 
tiplied by the two closest d values: 
[TV98:8.4.2] 
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T 
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bilinear transform: In image geome- 
try, a spatial non-affine transformation 
characterized by a pixel-wise mapping 
where the transformed pixel coordi- 
nate, (x', y’), is a multiplication of 
two linear functions of the original 
pixel coordinate, (x,y). For exam- 
ple, x =a + ax + my + axy, y'= 
a, + dsx + ay + axy for parameter 
coefficients a to æ. In general, trans- 
forms facilitate translation, rotation 
and warping in the pixel-wise map- 
ping, (x, y) — @’, y’): [SB11:7.11] 


bilinearity: A function of two variables 
x and y is bilinear if it is linear in y 
for fixed x and linear in x for fixed 
y. For example, if x and y are vectors 
and A is a matrix such that x’ Ay is 
defined, then the function f(x, y) = 
x'Ay+x-+ % is bilinear in x and J. 
[WP:Bilinear_form] 

bimodal histogram: A histogram with 
two pronounced peaks, or modes. This 
is a convenient intensity histogram 
for determining a binarizing threshold: 
[RYS95] 
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bin-picking: The problem of getting 
a robot manipulator equipped with 
vision sensors to pick parts, e.g., 
screws, bolts or components of a given 
assembly, from a random pile. A clas- 
sic challenge for hand-eye robotic sys- 
tems, involving at least segmentation, 
object recognition in clutter and pose 
estimation. [RK96] 


binarization: See thresholding. 


binary classifier: A two-class 
classification approach often used 
in explicit two-class problems or for 
detection problems, indicating the 
presence, true, or absence, false in a 
two-state decision. [Bar12:19.5.1] 


binary decision: A two-state decision, 
{0, 1} or {false, true}, generally output 
from a binary classifier. [Bar12:19.5.1] 

binary graph cut optimization: Parti- 
tioning into two discrete sets via the 
graph cut optimization algorithm. (See 
graph cut). 


binary image: An image whose pixels 


can be in either an “on” or an “off” 
state, represented by the integers 1 and 
0 respectively: [DH73:7.4] 


binary mathematical morphology: A 
group of shape-based operations that 
can be applied to binary images, 
based around a few simple math- 
ematical concepts from set the- 
ory. Common usages include noise 
reduction, image enhancement and 
image segmentation. The most basic 
operators, the dilate operator and the 
erode operator, take two pieces of 
data as input: the input binary image 
and a structuring element (also known 
as a kernel). Virtually all other math- 
ematical morphology operators can 
be defined in terms of combinations 
of erosion and dilation along with 
set operators such as intersection and 
union. Some of the more important are 
the open operator, the close operator 
and skeletonization. Binary morphol- 
ogy is a special case of gray scale 
mathematical morphology. See also 
mathematical morphology operation. 
[SQ04:7.1] 


binary moment: Given a binary image 
BC, j), there is an infinite family of 
moments indexed by the integer values 
p and q. The pqth moment is given by 
my = X; 0 171 BG, j). (KH9O] 


binary noise reduction: A method of 
removing salt-and-pepper noise from 
binary images. For example, a point 
could have its value set to the median 
value of its eight neighbors. [Pet95] 


binary object recognition: Model- 
based techniques and algorithms used 
in object recognition from binary 
images. [RHBLO7] 


binary operation: An operation, such 
as the pixel subtraction operator, 
that takes two images as inputs. 
[SOS00:2.3] 


binary region skeleton: See skeleton. 


binocular: A system that has two 
cameras looking at the same scene 
simultaneously usually from a simi- 
lar viewpoint. See also stereo vision. 
[TV98:7.1] 


binocular stereo: A method of deriving 
depth information from a pair of cal- 
ibrated cameras set at some distance 
apart and pointing in approximately 


the same direction. Depth information 
comes from the parallax between the 
two images and relies on being able to 
derive the same feature in both images. 
[JKS95:12.6] 


binocular tracking: A method of 
tracking objects or features in 3D using 
binocular stereo. [BS99] 


biometric feature: A feature with 
properties that can be used for 
identity verification of individuals. 
e.g., features used in gait analysis, 
face recognition and fingerprint 
identification [NA05:3.5.5] 


biometrics: The science of discriminat- 
ing individuals from accurate measure- 
ment of their physical features, such 
as retinal lines, finger lengths, finger- 
prints, voice characteristics and facial 
features. [WP:Biometrics] 


bipartite matching: Graph matching 
technique often applied in model- 
based vision to match observations 
with models or stereo matching to 
solve the correspondence problem. 
Assume a set V of nodes partitioned 
into two non-intersecting subsets V! 
and V?. In other words, V = V! U V? 
and V! N V? = 0. The only arcs E in 
the graph lie between the two subsets, 
i.e., E C {V! x V?}U{V? x V!}. This 
is the bipartite graph. The bipartite 
matching problem is to find a maximal 
set of nodes from the two subsets 
connected by arcs such that each node 
is connected by exactly one arc to a 
node in the other subset. One maximal 
matching in the graph below, with 
sets V! = {A, B, C} and V? = {X, Y}, 
is pairs (A, Y) and (C, X). The 
selected arcs are solid and other arcs 
are dashed: [WP:Matching_(graph_ 
theory)#Maximum_matchings_in_ 
bipartite_graphs] 


bit map: An image with one bit per pixel. 
[JKS95:3.3.1] 


bit-plane encoding: An image compres- 
sion technique where the image is bro- 
ken into bit planes and run length cod- 
ing is applied to each plane. To get 
the bit planes of an eight-bit gray scale 
image, the picture has a Boolean AND 
operator applied with the binary value 
corresponding to the desired plane. 
For example, ANDing the image with 
00010000 gives the fifth bit plane. 
[Jai89:11.2] 


bitangent: See curve bitangent. [WP: 
Bitangent] 


bitshift operator: The bitshift operator 
shifts the binary representation of each 
pixel to the left or right by a set number 
of bit positions. Shifting 01010110 
right by two bits gives 00010101. The 
bitshift operator is a computationally 
cheap method of dividing or multi- 
plying an image by a power of two. 
A shift of n positions is a multiplica- 
tion or division by 2”. [WP:Bitwise_ 
operation#Bit_shifts] 


bivariate time-series: A data set 
with two variables varying tempo- 
rally i.e., over time) sampled in 
unison. 


black body radiation: The electromag- 
netic radiation emitted by a black body 
object (an opaque, non-reflective mate- 
rial object) held at constant, uniform 
temperature (e.g., a bulb filament or 
coating in an artificial light source). 
The spectrum of the radiation emitted 
G.e., the color of the light) is depen- 
dent only on the temperature of the 
body and increases in intensity over the 
color range from dull red to brilliant 
blue-white (through the visible portion 
of the electromagnetic spectrum) as 
temperature increases. The name black 
body refers to the visual appearance 
of such objects at room temperature. 
[FP03:6.1.2] 


blade edge: A surface orientation discon- 
tinuity where a fold edge is seen against 
the background with the other side of 
the fold: [BB82:Ch. 9] 
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Blade Edges Fold Edges 


blanking: Clearing a CRT or video 
device. The vertical blanking interval 
(VBD in television transmission is used 
to carry data other than audio and 
video. [WP:Blanking_(video)] 


blending operator: An image process- 
ing operator that creates a third image 
C by a weighted combination of the 
input images A and B. In other words, 
CG, j) = aA, j) + BBG, j) for two 
scalar weights a and $. Usually, a + 
B = 1. The results of some process can 
be illustrated by blending the original 
and result images. In this figure, the 
blending adds a detected boundary to 
the original image: [WGG99] 


blind deconvolution: Performing 
image deconvolution without prior 
knowledge of the convolution 
operator that has been applied to 
the image (or the optical transfer 
function or point spread function 
characteristics of the camera sensor at 
the time of image capture). Associated 
with image deblur ring. [SB11:6.9] 


blind image forensics: Performing 
digital image forensic techniques on 
a digital image without prior knowl- 
edge of whether the image has been 
manipulated (e.g., there is no underly- 
ing file security checksum or access to 
the original image). [NCLS06] 
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blob analysis: A process used in medical 
image analysis. There are four steps: 

1. Derive optimum foreground/ 
background threshold to segment 
objects from their background. 

2. Binarize the images by applying a 
thresholding operation. 

3. Perform region growing and assign 
a label to each discrete group (blob) 
of connected pixels. 

4. Extract physical 
from the blobs. 

[WP:Blob_detection] 


measurements 


blob extraction: Part of blob analysis. 
See connected component labeling. 
[WP:Connected-component_labeling] 


block coding: A class of signal coding 
techniques. The input signal is parti- 
tioned into fixed-size blocks, and each 
block is transmitted after translation to 
a smaller (for image-compression) or 
larger (for error-correction) block size. 
[Jai89:11.1] 


blocking artifact: An artifact result- 
ing from the use of a block coding 
compression scheme caused by infor- 
mation loss resulting either from the 
use of lossy compression or from 
an error in data transmission (data 
transmission errors are more common 
in video compression for streaming 
video). See also compression noise: 
[Lan12:Compression] 


Blocking Artefacts (compression) 


blocks world: The simplified prob- 
lem domain in which much early 
research into artificial intelligence and 
computer vision was carried out. The 
essential feature of the blocks world is 
the restriction of analysis to simplified 
geometric objects such as polyhedra 
and the assumption that geometric 
descriptions, such as image edges, can 
easily be recovered from the image: 
[Nal93:Ch. 4] 


blooming: Too much light entering a 
digital optical system causes saturation 
of the CCD pixels so that charge over- 
spills into surrounding elements giving 
vertical or horizontal streaking in the 
image (depending on the orientation 
of the CCD). [CS09:3.3.5] 


Blum’s medial axis: See medial axis 
transform. 


blur: A measure of sharpness in an 
image. Blurring can arise from a vari- 
ety of causes including: the sen- 
sor being defocus ed; noise in the 


environment or image capture pro- 
cess; motion of the target or sensor; 
an image-processing operation: [WP: 
Blur] 


blur estimation: Estimation of the point 
spread function or optical transfer 
function for the process of image 
deblur ring by deconvolution. See also 
image restoration. [SB11:6.9] 


body part tracking: The tracking of 
human body parts in a video sequence, 
often used for human pose estimation. 
[PA06] 


boosting: An ensemble learning 
approach for the combination of 
multiple weak learners (classifiers) 
where each classifier has a weight 
proportional to its performance on 
the training set. The base (weak) 


classifiers are trained in sequence 
such that the mth classifier is trained 
over a training set re-weighted to give 
greater emphasis to training examples 
upon which the previous (n— 1) 
classifiers performed poorly (e., 
AdaBoost). Contrasts with bagging, in 
which all weak learners have equal 
weight in the ensemble and are trained 
independently. [Bis06:14.3] 


border detection: See 
detection. 


boundary 


border tracing: Given a pre-labeled (or 
segmented) image, the border is the 
inner layer of each region’s connected 
pixel set. It can be traced using a simple 
8-connective or 4-connective stepping 
procedure in a 3 x 3 neighborhood. 
[Nev82:8.1.4] 


bottom-up: An approach that starts from 
the smallest entities within a given 
topology (e.g., image pixels, mesh 
vertexes or low-level features) and per- 
forms agglomerative clustering, gen- 
eral global structure extraction or 
feature detection by successive merg- 
ing of scene information in a hier- 
archical methodology targeting gen- 
eral image understanding. Opposite 
of top-down. In computer vision, 
describes algorithms that use the data 
to generate hypotheses at a low level, 
that are refined as the algorithm pro- 
ceeds. [Sch89:Ch. 6] 


bottom-up event detection: An event 
detection approach that follows a 
bottom-up methodology. Within this 
context one could use motion features, 
action units or atomic actions as the 
lowest level in the hierarchy. See also 
behavior hierarchy. 


bottom-up segmentation: A seg- 
mentation approach that involves 
agglomerative clustering. See 


bottom-up. [BU08] 


boundary: A general term for the 
lower dimensional structure that sep- 
arates two objects, such as the curve 
between neighboring surfaces or the 
surface between neighboring volumes. 
[JKS95:2.5.1] 


boundary description: 
geometry-based or 


Functional, 
set-theoretic 
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description of a region boundary. 
For an example, see chain code. 
[Dav90:7.8] 


boundary detection: An 
image-processing algorithm that 
finds and labels the edge pixels 
between two neighboring image 
segments after segmentation. The 
boundary represents physical discon- 
tinuities in the scene, e.g., changes 
in color, depth, shape or texture. 
[Nev82:7.1] 


boundary grouping: An 
image-processing algorithm that 
attempts to complete a fully con- 
nected image-segment boundary from 
many broken pieces. A boundary 
might be broken because it is com- 
monplace for sharp transitions in 
property values to appear in the image 
as slow transitions; it might sometimes 
disappear for reasons including: 
noise, blur, digitization artifacts, poor 
lighting and surface irregularities. 
[LP90] 


boundary length: The length of the 
boundary of an object. See also 
perimeter. [WP:Perimeter] 


boundary matching: See 
matching. 


curve 


boundary property: Characteristic of 
a boundary, such as arc length or 
curvature. [KR88] 


boundary representation: See 
boundary description and B-rep. 


boundary-region fusion: A region 
growing segmentation approach 
where two adjacent regions are 
merged when their characteristics are 
close enough to pass some similarity 
test. The candidate neighborhood for 
testing similarity can be the pixels 
lying near the shared region boundary. 
[WP:Region_growing] 


boundary segmentation: See curve 
segmentation. 


bounding box: The smallest rectangu- 
lar prism that completely encloses an 
object or a set of points. The ratio 
of the lengths of the box sides is 
often used as a classification metric 
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in model-based recognition. [WP: 
Minimum_bounding_box] 


box filter: An alternative name for the 
mean filter. The name “box” derives 
from the use of a local neighborhood 
in which all of the pixels have 
equal weight (ie., 4 for an nxn 
neighborhood). [Dav90:Ch. 3] 


BRDF: See bidirectional reflectance 
distribution function. [WP: 
Bidirectional_reflectance_distribution_ 
function] 


breakpoint detection: See 
segmentation. 


curve 


breast scan analysis: See mammogram 
analysis. 


Brewster’s angle: When light reflects 
from a dielectric surface it is polarized 
perpendicularly to the surface normal. 
The degree of polarization depends 
on the incident angle and the refrac- 
tive indices of the air and reflective 
medium. The angle of maximum polar- 
ization is called Brewster’s angle and is 
given by 


n 
Og = tan! (=) 
m 


where m and m are the refrac- 
tive indices of the two materials. 
[Hec87:8.6] 


brightness: The quantity of radiation 
reaching a detector after incidence on 
a surface. Often measured in lux or 
ANSI lumens. When translated into an 
image, the values are scaled to fit the 
bit patterns available. For example, if 
an eight-bit byte is used, the maxi- 
mum value is 255. See also luminance. 
[DH73:7.2] 


brightness adjustment: Increase or 
decrease in the luminance of an image. 
To decrease, one can linearly interpo- 
late between the image and a pure 
black image. To increase, one can 
linearly extrapolate between a black 
image and the target. The extrapola- 
tion function is 


v = (1 — g) * fo + Q * ii 


where a is the blending factor (often 
between O and 1), v is the output 
pixel value and i) and 7, are the 


corresponding image and black pix- 
els. See also gamma correction and 
contrast enhancement. [WP:Gamma_ 
correction] 


brightness constancy: 1) The mech- 
anism of the human visual system 
that stabilizes relative shifts in object 
brightness caused by constantly vary- 
ing illumination in the environment 
such that objects always appear with 
the same lightness. 
2) The assumption made in a computer 
vision approach that object brightness 
is constant despite varying illumination 
in the environment. See also color 
constancy, illumination constancy. In 
this figure, the illumination varies in 
intensity: [Gol10] 


<= 


Varying illumination Intensity 


broadcast video: Refers to video of a 
content type, image quality and encod- 
ing format (see video transmission 
format) targeted at either analog or dig- 
ital transmission to a mass audience. 
Often uses video deinterlacing, lossy 
compression and MPEG encoding. 
[Sze10:8.4.3] 


Brodatz texture: A well-known set of 
texture images often used for testing 
texture-related algorithms. [NA05:8.2] 


building detection: A general term fora 
specific, model-based set of algorithms 
for finding buildings in data. The range 
of data used is large, encompassing 
stereo images, range images, and aerial 
and ground-level photographs. [LN98] 


bundle adjustment: An algorithm used 
to optimally determine the three- 
dimensional coordinates of points 


and camera positions from two- 
dimensional image measurements. 
This is done by minimizing some cost 
function that includes the model fit- 
ting error and the camera variations. 
The bundles are the light rays between 
detected 3D features and each cam- 
era center. The bundles are itera- 
tively adjusted (with respect to both 
camera centers and feature positions). 
[FP03:13.4.2] 


burn-in: 1) A phenomenon of early tube- 


based cameras and monitors where, if 
the same image was presented for long 
periods of time, it became permanently 
burnt into the phosphorescent layer. 
Since the advent of modern monitors 
in the 1980s, this no longer happens. 
2) The practice of shipping only elec- 
tronic components that have been 
tested for long periods, in the hope that 
any defects will manifest themselves 
early in the component’s life (e.g., 72 
hours of typical use). 

3) The practice of discarding the first 
several samples of a Markov chain 
Monte Carlo process in the hope that a 
very low-probability starting point will 
converge to a high-probability point 
before beginning to output samples. 
[NA05:1.4.1] 


butterfly filter: A linear filter designed 


to respond to “butterfly” patterns in 
images. A small butterfly filter convo- 
lution kernel is 


0 -2 0 
1 2 1 
0 -2 0 


It is often used in conjunction with 
the Hough transform for finding peaks 
in the Hough feature space, particu- 
larly when searching for lines. The line 
parameter values of (p, 0) will gener- 
ally give a butterfly shape with a peak at 
the approximate correct values. [LB87] 
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CAD: See computer-aided design. [WP: 
Computer-aided_design] 


calculus of variations: See variational 
approach. 


calibration object: An object or 


small scene with easily locatable fea- 
tures used for camera calibration: 
[HZ00:7.5.2] 


camera: 1) The physical device used to 
acquire images. 
2) The mathematical representation of 
the physical device and its characteris- 
tics, such as position and calibration. 
3) A class of mathematical models of 
the projection from 3D to 2D, such 
as affine-, orthographic- or pinhole 
camera. [NA05:1.4.1] 


camera calibration: Methods for deter- 
mining the position and orientation of 
cameras and range sensors in a scene 
and relating them to scene coordinates. 
There are essentially four problems in 
calibration: 


e interior orientation: determining 
the internal camera geometry, 
including its principal point, focal 
length and lens distortion; 

e exterior orientation: determining 
the orientation and position of the 
camera with respect to some abso- 
lute coordinate system; 

e absolute orientation: determining 
the transformation between two 
coordinate systems, and the position 
and orientation of the sensor in the 
absolute coordinate system from the 
calibration points; 

e relative orientation: determining 
the relative position and orientation 
between two cameras from projec- 
tions of calibration points in the 
scene. 

These are classic problems in the field 

of photogrammetry. [FP03:Ch. 3] 


camera connectivity matrix: An n x 
n matrix modeling relative camera 
view connectivity over a multi-camera 
topology based on the strength of 
multi-camera behavior correlation (or 
activity correlation) between any two 
cameras in the set. See also camera 
topology inference. [GX11:13.3] 


camera coordinates: 1) A viewer- 
centered representation relative to the 
camera. The camera coordinate system 
is positioned and oriented relative to 
the scene coordinate system and this 
relationship is determined by camera 
calibration. 
2) An image coordinate system that 
places the camera’s principal point at 
the origin (0, 0), with unit aspect ratio 
and zero skew. The focal length in cam- 
era coordinates may or may not equal 
1. If image coordinates are such that 
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the 3 x 4 projection matrix is of the 
form 


[RI 7] 


O Dm 
os o 
a) 


then the image and camera coordinate 
systems are identical. [HZ00:5.1] 


camera geometry: The physical geome- 
try of a camera system. See also camera 
model. [Sch89:Ch. 2] 


camera lucida: A (passive) optical pro- 
jection device used by artists (or scien- 
tific illustrators) that projects a view 
of the scene onto the page, under 
a perspective projection, such that 
it can be traced. Pre-dates modern 
photography and is still used in some 
photogrammetry contexts: [HA87] 


Camera Lucida 


Projection D 
prism (or 
mirror) $ 


camera model: A mathematical model 


of the projection from 3D (real-world) 
space to the camera image plane. For 
example see pinhole camera model. 
[Sch89:Ch. 2] 


camera motion compensation: See 
sensor motion compensation. 


camera motion estimation: See sensor 
motion estimation. [WP:Egomotion] 


camera pose: The location and orienta- 
tion of the camera in a given frame of 
reference. See pose. [Sze10:Ch. 7] 


camera position estimation: Estima- 
tion of the optical position of the cam- 
era relative to the scene or observed 
structure. This generally consists of six 
degrees of freedom (three for rotation 
and three for translation). It is often 
a component of camera calibration. 
Camera position is sometimes called 
the extrinsic parameters of the cam- 
era. Multiple camera positions may 


be estimated simultaneously with the 
reconstruction of 3D scene structure 
in structure and motion recovery algo- 
rithms. [SD03] 


camera topology inference: The auto- 
matic topology inference of the rel- 
ative positions and field of view 
overlap Gf any) of a set of cam- 
eras in a multi-camera system based 
on temporal evaluation of scene 
features, target tracking information 
or higher level multi-camera behavior 
correlation from each camera view. 
[GX11:Ch. 13] 


Canny edge detector: The first of 
the modern edge detectors. It took 
account of the trade-off between sen- 
sitivity of edge detection and accuracy 
of edge localization. The edge detector 
consists of four stages: 

1. Gaussian smoothing to reduce noise 
and remove small details; 

2. gradient magnitude and direction 
calculation; 

3. non-maximal suppression of smaller 
gradients by larger ones to focus 
edge localization; 

4. gradient magnitude thresholding 


and linking that uses hysteresis to 

start linking at strong edge positions 

and also to track weaker edges. 
The figure shows an example of the 
edge detection results: [JKS95:5.6.1] 


canonical configuration: A stereo cam- 
era configuration in which the optical 
axes of the cameras are parallel, the 
baselines are parallel to the image 
planes and the horizontal axes of the 
image planes are parallel. This results 
in epipolar lines that are parallel to 
the horizontal axes, hence simplify- 
ing the search for correspondences: 
[Che90] 
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canonical correlation analysis (CCA): 


With two vector-valued random vari- 
ables x and 9, the goal of CCA is to 
get an insight into the (linear) rela- 
tionships between the two sets of vari- 
ables. This is achieved by finding linear 
combinations u = @'X and v = b" } so 
that the cross correlation between u 
and v is maximized. The variables u 
and v are the first pair of canonical 
variables. The second pair of canonical 
variables is obtained by maximizing the 
same correlation, subject to the con- 
straint that they are uncorrelated with 
the first pair of canonical variables; this 
process may be continued up to the 
minimum dimensionality of x and }. In 
practice, the canonical correlation vec- 
tors are obtained via eigenvector com- 
putations. See also kernel canonical 
correlation analysis [MKB79:Ch. 10] 


canonical direction: The simplest, 


most obvious, normal (given prior 
environment or domain knowledge) 
or shortest magnitude direction within 
the context used. For example, the 
canonical direction of travel, implied 
from the motion field, is forward. 
[BPNG99:Ch. 3] 


canonical factor: The base vectors iden- 


tified in canonical correlation analysis, 
such that the correlation between the 
projection of the two multidimen- 
sional variables upon which CCA is 
being performed onto these canoni- 
cal factors (vectors) are mutually max- 
imised. [GX11:5.4.2] 


canonical variate: The magnitude 


of the maximized projection onto 
the canonical factor in canonical 
correlation analysis. [GX11:5.4.2] 


capsule endoscopy: An endoscope 
technique, for examination of the 
human digestive tract for medical diag- 
nosis, that uses a small un-tethered 
camera that is the size and shape of 
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a pharmaceutical pill. The capsule is 
swallowed by the patient and records 
images of the gastrointestinal tract 
including the small intestine which is 
difficult to image via traditional teth- 
ered endoscopy approaches. Images 
are transmitted from the capsule via 
wireless data transmission and the 
capsule is recovered naturally when 
passed by the patient. [SSM06] 


cardiac gated dynamic SPECT: A 


single photon emission computed 
tomography (SPECT) technique for 
facilitating cardiac image analysis 
whereby the timing of a sequence 
of image acquisition is dictated (i.e., 
gated) from an electrocardiogram 
(heart monitor) such that the SPECT 
images show the heart over a full con- 
traction cycle. [TYW05] 


cardiac image analysis: Techniques 


involving the development of 3D 
vision algorithms for tracking 
the motion of the heart from 
nuclear magnetic resonance and 
echocardiograph images. [Par02] 


Cartesian coordinates: A position 


description system where an m 
dimensional point, P, is described by 
exactly n coordinates with respect 
to n linearly independent and often 
orthonormal vectors, known as axes: 
[WP:Cartesian_coordinate_system] 


Z 
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cartography: The study of maps and 


map-building. Automated cartography 
is the development of algorithms that 


reduce the manual effort in map build- 
ing. [WP:Cartography] 


cascaded Hough transform: An appli- 
cation of several successive Hough 
transforms, with the output of one 
transform used as input to the next. 
[TGPM98] 


cascaded learning: A supervised 
learning technique whereby a binary 
classifier is formed by successive 
selection of the best performing weak 
classifier (weak learner) from a large 
set. This is initially performed using a 
variant of boosting over the training 
set and then over the examples that 
the previously selected classifiers 
mis-classified. This selected set of 
weak learners is then ordered (by 
order of training performance) in 
a pass-through cascade to classify 
unseen examples. Commonly used 
for object detection (e.g., faces): 
[Sze10:14.1] For example: 


Cascaded 
PASS Classifier 
REJECT ( C, 
PASS 
REJECT 
PASS 


REJECT OBJECT 
DETECTED 


cascading Gaussians: A term referring 
to the fact that the convolution of a 
Gaussian with itself is another Gaus- 
sian. [JKS95:4.5.4] 


cast shadow: A shadow thrown by an 
object on another object. See also 
attached shadow. [FP03:5.3.1] 


CAT: See X-ray CAT. 


catadioptric optics: The general 
approach of using mirrors in com- 
bination with conventional imaging 
systems to get wide viewing angles 
(180°). It is desirable that a catadiop- 
tric system has a single viewpoint 
because it permits the generation 


of geometrically correct perspective 
images from the captured images. 
[WP:Catadioptric_system] 


categorization: The subdivision of a 
set of elements into clearly distinct 
groups, or categories, defined by spe- 
cific properties. Also the assignment of 
an element to a category or recognition 
of its category. [WP:Categorization] 


category: A group or class used in a 
classification system. For example, in 
mean and Gaussian curvature shape 
classification, the local shape of a sur- 
face is classified into four main cate- 
gories: planar, ellipsoidal, hyperbolic 
and cylindrical. Another example is 
the classification of observed grazing 
animals into one of {sheep, cow or 
horse}. See also categorization. [WP: 
Categorization] 


caustic: The surface enveloping the set 
of camera rays, i.e., it is tangent to all 
rays. Often seen in reflective light mod- 
els as the projection of that envelope 
onto another surface: [SRT11:3.4.6] 


Camera 
(pinhole model) 


Field 
of View 


CAVE: A recursive acronym for Cave 
Audio-Visual Experience - an immer- 
sive audio-visual environment compris- 
ing of several screens surrounding the 
viewer (forming a complete 3D sur- 
round “cave” environment) with inte- 
grated head tracking to adapt content 
display to the viewer's head move- 
ments. [AGTLO9:6.1] 


CBIR: See content-based image retrieval. 
[WP:Content-based_image_retrieval# 
Query_techniques] 


CCD: Charge-coupled device. A solid 
state device that can record the num- 
ber of photons falling on it. 
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A 2D matrix of CCD elements are used, 
together with a lens system, in digi- 
tal cameras where each pixel value in 
the final image corresponds to the out- 
put of one or more of the elements. 
[FP03:1.4.1] 


CCIR camera: Camera fulfilling color 
conversion and pixel formation crite- 
ria laid out by the Comité Consul- 
tatif International pour la Radio. 
[Umb98:1.7.3] 


cell microscopic analysis: Automated 
image-processing procedures for find- 
ing and analyzing different cell types 
from images taken by a microscope 
vision system. Common examples are 
the analysis of pre-cancerous cells or 
blood cells. [WP:Live_blood_analysis] 


cellular array: A massively parallel com- 
puting architecture, composed of a 
high number of processing elements. 
Particularly useful in machine vision 
applications when a simple 1:N map- 
ping is possible between image pix- 
els and processing elements. See also 
systolic array and single instruction 
multiple data. [WP:Systolic_array] 


center line: See medial line. 


center of curvature: The center of the 
circle of curvature (or osculating cir- 
cle) at a point P ofa plane curve at 
which the curvature is nonzero. The 
circle of curvature is tangent to the 
curve at P, has the same curvature as 
the curve at P, and lies towards the 
concave (inner) side of the curve. The 
figure shows the circle and center of 
curvature, Č , Of a curve at point P: 
[FP03:19.1.1] 
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center of mass: The point within an 


object at which the force of gravity 
appears to act. If the object can be 
described by a multidimensional point 
set {x;} containing N points, the cen- 
ter of mass is t D x fŒ), where 
f(&%) is the value of the image (e.g., 
binary image or gray scale) at point X. 
[JKS95:2.2.2] 


center of projection: The position of 


the observer or camera in a 3D to 
2D planar projection (e.g., perspective 
projection). Within a pinhole camera 
model this corresponds to the cam- 
era position within the global frame 
of reference with the center of pro- 
jection occurring behind the imaging 
plane where all of the projection lines 
from the scene converge. Also known 
as the camera center or the optical 
center. [HZ00:6.1] 


center-surround operator: An opera- 


tor that is particularly sensitive to spot- 
like image features, which have higher 
(or lower) pixel values in the center 
than in the surrounding areas. The fol- 
lowing simple convolution mask can 
be used for orientation-independent 
spot detection: 


—i_1i_i 
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JRW97] 


central moment: An image moment 


that is translation invariant because the 
center of mass has been subtracted 


during the calculation. If f(c,r) is 
the input image pixel value (binary 
image or gray scale) at row r and col- 
umn c then the pgth central moment 
is >. >0,.€c — Pr — *)4 fc, r) where 
(€, ?) is the center of mass of the image. 
[Sch89:Ch. 6] 


central projection: Defined by projec- 
tion of an image on the surface of a 
sphere onto a tangential plane by rays 
from the center of the sphere. The 
intersection of a plane with the sphere 
is a “great circle”. The image of the 
great circle under central projection is 
a line. Also known as the “gnomonic” 
projection. [Sch89:Ch. 2] 


CENTRIST bag of words descriptor: 
A feature descriptor, akin to the sem- 
inal SIFT descriptor, specifically tar- 
geted at scene classification tasks via 
a bag of words classification frame- 
work. It primarily encodes the struc- 
tural properties of a scene, suppressing 
detailed textural information, result- 
ing in superior scene classification per- 
formance compared to contemporary 
feature descriptors within the same 
framework. [WR11] 


centroid: See center of mass. 


certainty representation: Any of a set 


of techniques for encoding the belief 
in a hypothesis, conclusion, calcula- 
tion etc. Example representation meth- 
ods are probability and fuzzy logic. 
[Mor88] 


chain code: An efficient method for 


boundary representation where an 
arbitrary curve is represented by a 
sequence of small vectors of unit 
length in a limited set of possible 
directions. Depending on whether 
the grid has 4 connectedness or 
8 connectedness, the chain code is 
defined as the digits from 0 to 3 or 0 to 
7, assigned to the four or eight neigh- 
boring grid points counter-clockwise. 
For example, the string 222233000011 
describes the small curve in the figure 
using a 4-connected coding scheme, 
Starting from the upper right pixel: 
[JKS95:6.2.1] 
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chamfer matching: A matching tech- 


nique based on the comparison of con- 
tours and on the concept of chamfer 
distance assessing the similarity of two 
sets of points. It can be used for match- 
ing edge images using the distance 
transform. See also Hausdorff distance. 
To find the parameters that register 
a library image and a test image, the 
binary edge map of the test image is 
compared to the distance transform. 
Edges are detected on image 1 and the 
distance transform of the edge pixels 
is computed. The edges from image 2 
are then matched. In the figure, trans- 
lation and scale parameters are used: 
[ZRHO3:2.3] 
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chamfering: See distance transform. 


change detection: See motion 
detection. 


character recognition: See optical 
character recognition. 


character verification: A process that 
confirms that printed or displayed 
characters are within some tolerance 
that guarantees that they are readable 
by humans. Used in applications such 
as labeling. [PPO3] 


characteristic view: An approach to 
object representation in which an 
object is encoded by a set of views 
of the object. The views are chosen 
so that small changes in viewpoint 
do not cause large changes in appear- 
ance (e.g., a singularity event). Real 
objects have an unrealistic number of 
singularities, so practical approaches 
to creating characteristic views require 
approximations, such as only using 
views on a tessellated viewsphere, 
or only representing the viewpoints 
that are reasonably stable over large 
ranges on the viewsphere. See also 
aspect graph and appearance-based 
recognition. [WF90] 


chess-board distance metric: See 
Manhattan distance. [WP:Chebyshev_ 
distance] 


chi-squared distribution (x°): The dis- 
tribution of squared lengths of vec- 
tors drawn from a normal distribu- 
tion. Specifically let the cumulative 
distribution function of the x? distri- 
bution with d degrees of freedom be 
denoted x?(d, u). Then the probabil- 
ity that a point x drawn from a d- 
dimensional Gaussian distribution will 
have squared norm |x|? less than a 
value t is given by x7(d, t). This figure 
shows empirical and theoretical plots 
of the x? probability density function 
with five degrees of freedom: [WP:Chi- 
squared_distribution] 
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chi-squared test: A statistical test of the 


hypothesis that a set of sampled values 
has been drawn from a given distribu- 
tion. See also chi-squared distribution. 
[WP:Chi-squared_test] 


chip sensor: A CCD or other 


semiconductor-based, light-sensitive 
imaging device. [ABD+91] 


chord distribution: A 2D shape descrip- 


tion technique based on all chords 
in the shape (that is all pairwise seg- 
ments between points on the bound- 
ary). Histograms of their lengths and 
orientations are computed. The values 
in the length histogram are invariant to 
rotation and scale linearly with the size 
of object. The orientation histogram 
values are invariant to scale and shifts. 
[YJ84] 


chroma: The color portion of a video 


signal that includes hue and saturation, 
requiring luminance to make it visible. 
It is also referred to as chrominance. 
[WP:Chroma] 


chroma keying: The separation of a 


given object or background color- 
based on its chrominance value alone. 
Often used in media production to sep- 
arate a motion silhouette from a con- 
trolled studio background to allow the 
insertion of an artificial background: 
[MNTT12:2.2.6] 


chromatic aberration: A focusing 


problem where light of different 
wavelengths (color) is refracted by 
different amounts and consequently 
images at different places. As blue light 
is refracted more than red light, objects 
may be imaged with color fringes at 
places where there are strong changes 
in lightness. [FP03:1.2.3] 


chromaticity diagram: A 2D slice 


of a 3D color space. The CIE 1931 
chromaticity diagram is the slice 


through the xyz color space of the CIE 
L*A*B* model where x+y+z= 1. 
This slice is shown in the figure. The 
color gamut of standard 0-1 RGB 
values in this model is the bright trian- 
gle in the center of the horseshoe-like 
shape. Points outside the triangle have 
had their saturations truncated. See 
also CIE chromaticity coordinates. 
[WP:CIE_xyY#CIE_xy_chromaticity_ 
diagram_and_the_CIE_xyY_color. 
space] 
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chrominance: 1) The part of a video sig- 
nal that carries color. 
2) One or both of the color axes in a 
3D color space that distinguishes inten- 
sity and color. See also chroma. [WP: 
Chrominance] 


chromosome analysis: Vision tech- 
nique used for the diagnosis of some 
genetic disorders from microscope 
images. This usually includes sorting 
the chromosomes into the 23 pairs and 
displaying them in a standard chart. 
[PRS+95] 


CID: Charge injection device. A type of 
semiconductor imaging device with a 
matrix of light-sensitive cells. Every 
pixel in a CID array can be individu- 
ally addressed via electrical indexing of 
row and column electrodes. It is unlike 
a CCD because it transfers collected 
charge out of the pixel during readout, 
thus erasing the image. [BM76] 


CIE chromaticity coordinates: Coordi- 
nates in the CIE color representation 
system with reference to three ideal 
standard colors X, Y and Z. Any visible 
color can be expressed as a weighted 
sum of these three ideal colors, e.g., 


p= wX + wY + w,Z. The normal- 
ized values are given by 


w1 
> A E 
Ww, + W2 + W3 
W2 
y= 


Ww, + W2 + W3 


w3 


wı + w2 + w3 


Since x+ y+ z= 1, we only need to 
know two of these values, say (x, y). 
These are the chromaticity coordi- 
nates. [JKS95:10.3] 


CIE L*A*B* model: A color 


representation system based on 
that proposed by the Commission 
Internationale de lEclairage (CIE) 
as an international standard for color 
measurement. It is designed to be 
device-independent and perceptually 
uniform (i.e., the separation between 
two points in this space corresponds 
to the perceptual difference between 
the colors). L*A*B* color consists of 
a luminance, L*, and two chromatic 
components: A* from green to red and 
B* from blue to yellow. See also CIE 
L*U*V* model. [JKS95:10.3] 


CIE L*U*v* model: A color 


representation system where col- 
ors are represented by luminance 
d* and two chrominance compo- 
nents (U*V*). A given change in 
value in any component corresponds 
approximately to the same perceptual 
difference. See also CIE L*A*B* model. 
(JKS95:10.3] 


circle: A curve consisting of all points on 


a plane lying a fixed radius r from the 
center point C: 
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The arc defining the entire circle is 
known as the circumference and is of 
length 2xr. The area contained inside 
the curve is given by A = mr’. A cir- 
cle centered at the point (b, KR) has 
equation (x — P? + (y — R? = r°. The 
circle is a special case of the ellipse. 
[NA05:5.4.3] 


circle detection: A class of algorithms, 
e.g., the Hough transform, that locate 
the centers and radii of circles in 
digital images. In general images, 
scene circles usually appear as ellipses: 
[Dav90:Ch. 9] 


circle fitting: Deriving circle parame- 


ters from either 2D or 3D observa- 
tions. As with all fitting problems, one 
can search the parameter space using a 
good metric (e.g., a Hough transform) 
or can solve a well-posed least-squares 
problem. [JKS95:6.8.4] 


circular convolution: The circular con- 


volution (cz) of two vectors {x;} and 
{y;} that are of length n is defined 
as Cp = Le xy; where 0O<k<n 
and j =@—kmodn. [WP:Circular_ 
convolution] 


circularity: One measure C of the 


degree to which a 2D shape is similar 
to a circle is given by 


A 
C = 41 (+) 
P? 


where C varies from 0 (non-circular) 
to 1 (perfectly circular). A is the object 
area and P is the object perimeter. [WP: 
Circular_definition] 


city block distance: See Manhattan 
distance. 


class separability: A measure of the sep- 


arability of two or more classes based 
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on their class conditional distribu- 
tions, e.g., on the between-class scatter 
matrix and within-class scatter matrix 
(as for the Fisher linear discriminant) 
or on the Bhattacharyya distance. 
[Fuk90:Ch. 10] 


classification: A general term for the 


assignment of a label (or class) to a 
structure (e.g., pixels, regions, lines 
etc.). Example classification problems 
include: labeling pixels as road, veg- 
etation or sky; deciding whether 
cells are cancerous based on shape; 
deciding whether the observed face 
belongs to an allowed system user. 
[Dav90:1.2.1] 


classifier: An algorithm assigning a class 


to an input pattern or data. See also 
classification, unsupervised learning, 
clustering, supervised classification 
and rule-based classification. [FP03:Ch. 
22] 


clipping: Removal or non-rendering of 


objects that do not coincide with the 
display area. [NA05:3.3.1] 


clique: A fully connected subgraph of a 


graph, G. In a fully connected graph, 
every vertex is a neighbor of all oth- 
ers. The figure shows a clique with five 
nodes: 


(There are other cliques in the 
graph with fewer nodes, e.g., 
ABac with four nodes etc.) [WP: 
Clique_(graph_theory)] 


close operator: The application of 


two binary morphology operators (the 
dilate operator followed by the erode 
operator), which has the effect of fill- 
ing small holes in an image. The figure 
shows the result of closing with a mask 
22 pixels in diameter: [JKS95:2.6] 


closed-circuit television (CCTV): A 
video capture system installed in a spe- 
cific environment, with limited viewer 
access, for the purposes of visual 
surveillance. [GS05] 


closed set recognition: Performing 
recognition over a closed set of enti- 
ties such that the query object must be 
in the a priori set. Commonly used in 
biometrics where, e.g., an individual is 
recognized from a known set in a face 
recognition database. [Wec06:7.7] 


closed world: The assumption that a 
complete taxonomy of scene objects is 
known and that each pixel should be 
explained as belonging to one of the 
scene objects. A specialization of the 
general artificial intelligence closed- 
world assumption, i.e., that everything 
that is not known to be true is false. 
Here interpreted as every pixel not 
explained as belonging to a known 
scene object is thus not part of a (rele- 
vant) scene object. [IB95] 


clump splitting: In some classes of 
images a clump of objects may be 
detected or isolated, e.g., clumps of 
cells in biological images. Clump split- 
ting seeks segmentation of the multiple 
objects from the clump. 


clustering: 1) Identifying the subsets of 
a set of data points {x,} based on some 
property, such as proximity. 
2) Grouping together image regions 
or pixels into larger, homogeneous 
regions sharing some property. See 
also: agglomerative clustering, divisive 
clustering, graph theoretic clustering, 
hierarchical clustering, k-means 
clustering, non-parametric clustering, 
spectral clustering and squared error 
clustering. [FP03:14.1.2] 


clutter: A generic term for unmod- 
eled or uninteresting elements in an 
image. For example, a face detector 


generally has a model for faces and 
other objects are regarded as clutter. 
The background of an image is often 
expected to include clutter. Loosely 
speaking, clutter is more structured 
than noise. [FP03:18.2.1] 


CMOS: Complementary  metal-oxide 
semiconductor. A technology used 
in making image sensors and other 
computer chips. [NA05:1.4.1] 


CMY: See CMYK. 
CMYB: See CMYK. 


CMYK: Cyan, magenta, yellow and 
black color model. It is a subtractive 
model where colors are absorbed by 
a medium, e.g., pigments in paints. 
Where the RGB color model adds hues 
to black to generate a particular color, 
the CMYK model subtracts from white. 
Red, green and blue are secondary col- 
ors in this model: [Gal90:3.7] 


coarse-to-fine processing: Multi-scale 
algorithm application that begins by 
processing at a large, or coarse, level 
and then, iteratively, processes to 
a small, or fine, level. Importantly, 
results from each level must be propa- 
gated to ensure a good final result. It is 
used for computing, e.g., optical flow. 
[FP03:7.7.2] 


coaxial illumination: Front lighting 
with the illumination path running 
along the imaging optical axis. An 
advantage of this technique is that 
there are no visible shadows or direct 
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specularities from the camera’s view- 
point: [ZS93] 


HALF-SILVERED 


CAMERA -=> OPTICAL 
AXIS 
Ps 
ae TARGET AREA 
LIGHT SOURCE 
codebook: The set of identified 


code-words resulting from feature 
descriptor clustering when using a 
bag of words classification approach. 
[Sze10:14.4.3] 


codeword: See codebook and bag of 
features. 


cognitive vision: A part of computer 
vision focusing techniques for 
recognition and categorization of 
objects, structures and events, learning 
and knowledge representation, con- 
trol and visual attention. [CN06:Ch. 4] 


coherence detection: Stereo vision 
technique where two images are 
searched for maximal patch correla- 
tions to generate features. It relies on 
having a good correlation measure and 
a suitably chosen patch size. [Hen97] 


coherence scale/volume: In 2D or 3D 
confocal microscopy, the region or 
area in which the light waves are 
in coherence (e.g., in synchroniza- 
tion). Beyond a certain distance or 
time synchronization differences arise. 
The coherence time is the time inter- 
val during which light waves traveling 
over a certain distance (the coherence 
length) maintains a phase difference of 
less than z. The coherence surface is 
the spatial region over which the phase 
difference of the optical field is less 
than z. It follows that the coherence 
volume is the product of the coher- 
ence length and the coherence sur- 
face. Interference between superim- 
posed coherent waves are only visible 
in this volume. [Paw06:Ch. 5] 


coherent fiber optics: Many fiber-optic 
elements bound into a single cable 
component with the individual fiber 
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spatial positions aligned, so that it can 
be used to transmit images. [Oko82] 


coherent light: Light, e.g., generated 
by a laser, in which the emitted light 
waves have the same wavelength and 
are in phase. Such light waves can 
remain focused over long distances. 
[WP:Collimated_light] 


coincidental alignment: When two 
structures seem to be related, but in 
fact the structures are independent or 
the alignment is just a consequence 
of being in some special viewpoint. 
Examples are the collinearity of ran- 
dom edges or coplanarity of surfaces, 
or object corners being nearby. See 
also non-accidentalness. [HNOO] 


collimate: To align the optics of a vision 
system, especially those in a telescopic 
system. [WP:Collimated_light] 


collimated lighting: A special form of 
structured light (e.g., directional back 
lighting). A collimator produces light 
in which all the rays are parallel: 


camera 


optical <T] 


“A 7 


lamp 


It is used to produce well-defined shad- 
ows that can be cast directly onto a 
sensor or an object. [KKK03] 


collinearity: The property of lying along 
the same straight line. [HZ00:1.3] 

collineation: See 
transformation. 


projective 


color: Color is both a physical and 
a psychological phenomenon. Physi- 
cally, color refers to the nature of an 
object texture that allows it to reflect 
or absorb particular parts of the light 
incident on it. See also reflectance. The 
psychological aspect is characterized 
by the visual sensation experienced 
when light of a particular frequency or 
wavelength is incident on the retina. 
The key paradox here concerns why 
light of slightly different wavelengths 
should be be so perceptually different 
(e.g., red and blue). [Hec87:4.4] 


color-based database indexing: 
See color-based image retrieval. 
[WP:Content-based_image_retrieval# 
Color] 


color-based image retrieval: An exam- 
ple of the more general image database 
indexing process, where one of the 
main indices into the image database 
comes from color samples, the color 
distribution from a sample image, or 
a set of text color terms (e.g., “red”). 
[WP:Content-based_image_retrieval# 
Color] 


color clustering: 
segmentation. 


See color image 


color constancy: The ability of a vision 
system to assign a color description 
to an object that is independent of 
the lighting environment. This allows 
the system to recognize objects under 
many different lighting conditions. The 
human vision system does this auto- 
matically, but most machine vision sys- 
tems cannot. For example, humans 
observing a red object in a cluttered 
scene under a blue light will still see 
the object as red. A machine vision sys- 
tem might see it as a very dark blue. 
[WP:Color_constancy] 


color co-occurrence matrix: A matrix 
(actually a histogram) whose elements 
represent the sum of color values exist- 
ing in a given image in a sequence, 
at a certain pixel position relative 
to another color existing at a dif- 
ferent position in the image. See 
also co-occurrence matrix. [WP:Co- 
occurrence_matrix] 


color correction: 1) Adjustment of col- 
ors to achieve color constancy. 
2) Any change to the colors of an 
image. See also gamma correction. 
[WP:Color_correction] 


color differential invariant: A type of 
differential invariant based on color 


; F VR-VG : 
information, such as ==>, which 
s IIV RIIIYGII?” 


has the same value invariant to transla- 
tion, rotation and variations in uniform 
illumination. [MGD98] 


color doppler: A method for noninva- 
sively imaging blood flow through the 
heart or other body parts by display- 
ing flow data on a two-dimensional 
echocardiographic image. Blood flow 
in different directions is displayed in 
different colors. [HNH+87] 


color edge detection: The process of 
edge detection in color images. A sim- 
ple approach combines (e.g., by addi- 
tion) the edge strengths of the individ- 
ual RGB color planes. [Kos95] 


color efficiency: A trade-off that is 
made with lighting systems, where 
conflicting design constraints require 
energy-efficient production of light 
while simultaneously producing a suf- 
ficiently broad spectrum illumination 
that the colors look natural. An obvi- 
ous example of a skewed trade-off is 
with low pressure sodium street light- 
ing. This is energy efficient but has 
poor color appearance. [Sel07] 


color feature: A feature that incor- 
porates color information within its 
description of the image or local region 
of interest. [GGWG12:Ch. 14] 


color gamut: The subset of all possible 
colors that a particular display device 
(CRT, LCD, printer) can display. Physi- 
cal differences in how various devices 
produce colors mean each scanner, dis- 
play and printer has a different gamut, 
or range, of colors that it can repre- 
sent. The RGB color gamut can only 
display approximately 70% of the col- 
ors that can be perceived. The CMYK 
color gamut is much smaller, repro- 
ducing about 20% of perceivable col- 
ors. The color gamut achieved with 
premixed inks (such as the Pantone 
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Matching System) is also smaller than 
the RGB gamut. [WP:Gamut] 


color grading: The process of alter- 
ing or enhancing color in video or 
image content. Commonly applied 
to older video content, pre-dating 
modern color quality techniques, 
recorded to analog media prior to 
re-broadcast or re-release for viewing 
on contemporary hardware. See also 
color correction, image enhancement. 
[Hur10:Introduction] 


color halftoning: See dithering. [WP: 
Halftone#Multiple_screens_and_ 
color_halftoning] 


color histogram matching: Used in 
color indexing where the similarity 
measure is the distance between color 
histograms of two images, e.g., by 
using the Kullback-Leibler divergence 
or Bhattacharyya distance. [HSE+95] 


color HOG: A color extension to 
the histogram of oriented gradients 
descriptor calculating gradients in a 
given color space (based on color 
similarity) in place of the original gray 
scale gradient approach. 


color image: An image where each ele- 
ment (pixel) is a tuple of values from a 
set of color bases. [Umb98:1.7.3] 


color image restoration: See image 
restoration. 


color image segmentation: Segment- 
ing a color image into homogeneous 
regions based on some similarity cri- 


teria. The figure shows boundaries 
around typical regions: [DMS99] 


color indexing: Using color informa- 
tion, e.g., color histograms, for image 
database indexing. A key issue is vary- 
ing illumination. It is possible to use 
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ratios of colors from neighboring loca- 
tions to obtain illumination invariance. 
[WP:Color_index] 


color layout descriptor: Part of the 
compact set of MPEG-7 descriptors 
designed to capture the global, spa- 
tial color distribution of an image. 
This descriptor is produced by spa- 
tially dividing the image into 64 blocks 
(8 x 8), regardless of size, from which 
a representative color (e.g., the mean 
color) is extracted. This set of 64 col- 
ors, transformed to YCrCb color space, 
is then represented using the discrete 
cosine transform coefficients of each 
of the YCrCb color channels. [ISO02] 


color matching: The phenomenon of 
trichromacy means any color stimulus 
can be matched by a mixture of the 
three primary stimuli. Color matching 
is expressed as : 


C = RR + GG + BB 


where a color stimulus C is matched 
by R units of primary stimulus R mixed 
with G units of primary stimulus G 
and B units of primary stimulus B. 
[Win05:2.5.1] 


color mixture model: A mixture model 
based on distributions in some color 
representation system that specifies 
both the color groups in a model 
as well as their relationships to each 
other. The conditional probability of 
an observed pixel x; belonging to an 
object O is modeled as a mixture with 
K components. [JR99] 


color models: See color representation 
system. [WP:Color_model] 


color moment: A color image descrip- 
tion based on histogram moments of 
each color channel, e.g., the mean, 
variance and skewness of the his- 
tograms. [MMG99] 


color normalization: Techniques for 
normalizing the distribution of color 
values in a color image, so that 
the image description is invariant to 
illumination. One simple method for 
producing invariance to lightness is to 
use vectors of unit length for color 
entries, rather than coordinates in 


the color representation system. [WP: 
Color_normalization] 


color quantization: The quantization 
(i.e., discretization) of the image sig- 
nal into a number of bins each rep- 
resenting a specific level of intensity 
(i.e., pixel values). Occurs at image 
capture, at the CCD or CMOS sensor 
level, within the camera. For RGB color 


image capture this is facilitated using 
the Bayer pattern on the sensor itself. 
Also performed as a secondary process, 
re-quantization for color reduction or 
color re-mapping: 


256 gray (color) quantisation levels 16 gray (color) quantisation levels 


Coarser quantization allows image 
compression with fewer bits. 
[SB11:2.3.2] 


color re-mapping: An image transfor- 
mation where each original color is 
replaced by another color from a col- 
ormap. If the image has indexed col- 
ors, this can be a very fast opera- 
tion and can provide special graphical 
effects for very low processing over- 
head: [WAF+98] 


Original 


Color remapped 


color representation system: A 2D 
or 3D space used to represent 
a set of absolute color coordi- 
nates. RGB and CIE L*U*V* model 
are examples of such spaces. [WP: 
List_of_color_spaces_and_their_uses] 


color similarity: The relative differ- 
ence between two color values in 
a given color space representation. 
See also similarity metric. Within per- 
ceptual psychology, the perceived 
color difference by human subjects 
under natural or controlled condi- 
tions. See also brightness constancy. 
[BWKW98:Ch. 2] 


color space: See color representation 
system. [WP:Color_space] 


color structure descriptor: Part of the 
compact set of MPEG-7 descriptors 
designed to capture the structure of 
the image color content. Uses an 
extension to the concept of a color 
histogram: a structuring element is 
used to produce a histogram based on 
local color occurrence within the local 
neighborhood (under the structuring 
element) instead of just at each indi- 
vidual pixel. See also color layout 
descriptor. [ISO02] 


color temperature: A scalar measure of 
colour. 1) The temperature of a given 
colour C is the temperature in kelvins 
at which a heated black body would 
emit light that is dominated by colour 
C. It is relevant to computer vision 
in that the illumination color changes 
the appearance of observed objects. 
The color temperature of incandescent 
lights is about 3200 kelvins and sun- 
light is about 5500 kelvins. 
2) Photographic color temperature is 
the ratio of blue to red intensity. [WP: 
Color_temperature] 


color texture: Variations (texture) in the 
appearance of a surface (or region, 
illumination etc.) arising from spatial 
variations in the color, reflectance or 
lightness of a surface. [JH98] 


colorimetry: The measurement of color 
intensity relative to some standard. 
[WP:Colorimetry] 


colorization: The process of adding 
color to a gray scale image or video 
content. Performed manually or via an 
automatic process. [Sze10:10.3.2] 


combinatorial explosion: Correctly, 
how the computational requirements 
of an algorithm increase very quickly 
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relative to the increase in the number 
of elements to be processed, as a con- 
sequence of having to consider all com- 
binations of elements. For example, 
consider matching M model features 
to D data features with D > M where 
each data feature can be used at most 
once and all model features must be 
matched. Then the number of possi- 
ble matchings that need to be consid- 
ered is Dx (D—1)x(D-2)...x 
CD — M + 1). Here, if M increases by 
only one, approximately D times as 
much matching effort is needed. Com- 
binatorial explosion is used loosely to 
refer to non-combination algorithms 
whose effort grows rapidly with even 
small increases in input data sizes. [WP: 
Combinatorial_explosion] 


compactness: A descriptor that is scale 
invariant, translation invariant and 
rotation invariant, based on the ratio 


perimeter? ` 2 
perimeter” 11K §95:2.5.7] 


area 

compass edge detector: A class of 
edge detector based on combining the 
response of separate edge operators 
applied at several orientations. The 
edge response at a pixel is commonly 
the maximum of the responses over 
the several orientations. [RT99] 


composite filter: Hardware or soft- 
ware image processing method based 
on a mixture of components such 
as noise reduction, feature detection 
and grouping. [WP:Composite_image_ 
filter] 


composite video: A television video 
transmission method created as a 
backward-compatible solution for the 
transition from black-and-white to 
color television. Black-and-white TV 
sets ignore the color component while 
color TV sets separate out the color 
information and display it with black- 
and-white intensity. [WP:Composite_ 
video] 


compression: See image compression. 


compression noise: Noise artifacts 
introduced by lossy compression tech- 
niques in image storage or trans- 
mission. See also blocking artifact. 
[SB11:1.3.2] 
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compressive sensing: An efficient sig- 


nal representation technique that uses 
a sparse representation to allow the 
signal to be stored and reconstructed 
using relatively few measurement sam- 
ples. Achieved by solving an under- 
determined system of linear equations 
in conjunction with the sparseness 
constraint on the original signal. Used 
in registration and image compression. 
Also denoted as “compressed sens- 
ing” “compressed sampling” or “sparse 
sampling”. [Ela10:Ch. 9] 


compressive video: 1) The use of 


standard video compression for video 
transmission or storage. 

2) The use of compressive sensing 
techniques for video transmission or 
storage. [SSC08] 


computational camera: A digital 


camera that uses controllable optics 
and computational decoding built 
into the device itself to produce new 
types of image, e.g., wide-field-of-view 
images, high-dynamic-range images, 
multi-spectral images and depth 
images. [Nay06] 


computational complexity: A bound 


on the theoretical number of basic 
computational operations required to 
perform a given algorithm or process 
independent of the hardware upon 
which it is implemented. For exam- 
ple an algorithm that iterates over 
every pixelinaw x 4 dimension image 
(e.g., a mean filter) has computational 
complexity, denoted OO, bounded by 
O(wb). [Mit97:7.1] 


computational imaging: See 


computational photography and 
computational cameras. 


computational photography: The use 


of image analysis and image processing 
techniques applied to one or more 
photographs to create novel images 
beyond the capabilities of tradi- 
tional digital camera photography. 
See also computational cameras. 
[Sze10:Ch. 10] 


computational symmetry: General 


term referring to algorithmic treat- 
ment of symmetries. See symmetry 
detection, bilateral symmetry, 


symmetric axis transform (SAT). 
[LHKG10] 


computational theory: An approach 
to computer vision algorithm descrip- 
tion promoted by Marr. A process can 
be described at three levels: imple- 
mentation (e.g., as a program), algo- 
rithm (e.g., as a sequence of activities) 
and computational theory. This third 
level is characterized by the assump- 
tions behind the process, the mathe- 
matical relationship between the input 
and output processes and the descrip- 
tion of the properties of the input data 
(e.g., assumptions of statistical distri- 
butions). The advantage claimed for 
this approach is that it makes explicit 
the essentials of the process, which can 
then be compared to the essentials of 
other processes solving the same prob- 
lem. By this method, the implementa- 
tion details that can confuse compar- 
isons can be ignored. [MP79] 


computational tractability: Within 


computational complexity theory, 
whether or not a given process or 
algorithm is feasible within a given 
finite number of computational oper- 
ations (relating to time-scale) and 
resources (relating to memory or 
storage). Those which are not feasible 
are referred to as computationally 
intractable. [GJ79] 


computational vision: See computer 
vision. 


computed axial tomography: Also 


known as CAT. An X-ray procedure 
used in conjunction with vision tech- 
niques to build a 3D volumetric image 
from multiple X-ray images taken from 
different viewpoints. The procedure 
can be used to produce a series of 
cross sections of a selected part of the 
human body, that can be used for med- 
ical diagnosis. [WP:X-ray computed 
tomography] 


computer-aided design (CAD): 1) A 


general term for design processes in 
which a computer assists the designer, 
e.g., in the specification and layout 
of components. For example, most 
mechanical parts are now designed by 
a CAD process. 


2) A term used to distinguish objects 
designed with the assistance of a com- 
puter. [WP:Computer-aided_design] 


computer vision: A broad term for the 


processing of image data. Every pro- 
fessional will have a different defini- 
tion that distinguishes computer vision 
from machine vision, image processing 
or pattern recognition. The boundary 
is not clear, but the main issues that 
lead to this term being used are an 
emphasis on the underlying theories 
of optics, light and surfaces; underly- 
ing statistical, property and shape mod- 
els; theory-based algorithms (in con- 
trast to commercially exploitable algo- 
rithms); and issues related to what 
humans broadly relate to “understand- 
ing” as contrasted with “automation”. 
(JKS95:1.1] 


concave mirror: The type of mirror 


used for imaging, in which a concave 
surface is used to reflect light to a 
focus. The reflecting surface is usu- 
ally rotationally symmetric about the 
optical or principal axis and the mir- 
ror surface can be part of a sphere, 
paraboloid, ellipsoid, hyperboloid or 
other surfaces. It is also known as a 
“converging mirror” because it brings 
light to a focus. In the case of 
a spherical mirror, the mirror focal 
point, F, is half way between the ver- 
tex and the sphere center, C: [WP: 
Curved_mirror#Concave_mirrors] 


concave 
mirror 


object 


principal axis C 


ae 


concave residue: The set difference 


between a shape and its convex hull. 
For a convex shape, the concave 
residue is empty. The figure shows 
some shapes (in black) and their con- 
cave residues (in gray): [RPLK08] 
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concavity: Loosely, a depression, dent, 
hollow or hole in a shape or sur- 


face. More precisely, a connected 
component of a shape’s concave 


residue. [WP:Concave] 


concavity tree: A hierarchical descrip- 
tion of an object in the form of a tree. 
The concavity tree of a shape has the 
convex hull of its shape as the parent 
node and the concavity trees of its con- 
cavities as the child nodes. These are 
subtracted from the parent shape to 
give the original object. The concav- 
ity tree of a convex shape is the shape 
itself. The figure shows a gray shape 
and its concavity tree: [Dav90:6.6] 
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concurrence matrix: See 
co-occurrence matrix. 


condensation tracking: Conditional 
density propagation tracking. The 
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particle filter technique applied by 
Blake and Isard to edge tracking. A 
framework for object tracking with 
multiple simultaneous hypotheses that 
switches between multiple continuous 
autoregressive process motion mod- 
els according to a discrete transition 
matrix. Using importance sampling it is 
possible to keep only the N strongest 
hypotheses. [IB98] 


condenser lens: An optical device used 
to collect light over a wide angle and 
produce a collimated output beam. 
[WP:Condenser_(microscope)] 


conditional density propagation: The 
term “condensation” (as used in 
condensation tracking) is a loose 
acronym of conditional density prop- 
agation. See also particle filter. [IB98] 


conditional dilation: A binary image 
operation that is a combination of 
the dilate operator and a logical AND 
operator with a mask that only allows 
dilation into pixels that belong to the 
mask. This process can be described by 
the formula: dilateCX, J) M, where 
X is the original image, M is the 
mask and J is the structuring element. 
[HHRO1] 


conditional distribution: A distribu- 
tion of one variable given the val- 
ues of one or more other variables. 
See also conditional probability. [WP: 
Conditional_probability_distribution] 


conditional independence: Given 
three disjoint sets of variables X, Y 
and Z, if p(X, YID = p(X|ZDp|Z) 
then X and Y are conditionally 
independent given Z. A special 
case is if Z is the empty set: if 
P(X, Y) = pCOpCly) then X and Y are 
independent. [Bis06:p. 372] 


conditional probability: Given two dis- 
joint sets of random variables X and 
Y, the conditional probability pCX|Y) 
: DX Y) 
is defined as p(X|Y) = —— 


BPO) 
p(Y) > 0, and is read as the distri- 


bution of X given the value of Y. 
[BisOG6:1.2] 


conditional random field (CRF): An 
undirected graphical model (UGM) 
defines a joint probability distribution 


over a set of random variables. A con- 
ditional random field defines a condi- 
tional joint distribution over one set 
of variables y given another set of 
variables x. The form of this distribu- 
tion is similar to a UGM, involving a 
product of clique potentials. The CRF 
addresses the supervised learning task 
of predicting y given x. In computer 
vision, CRF is used in dense stereo 
matching, where the task is to esti- 
mate the depth of each pixel on the 
image grid given a pair of input images. 
[Mur12:Ch. 17] 


conditional replenishment: A method 
for coding video signals, where only 
the portion of a video image that has 
changed since the previous frame is 
transmitted. Effective for sequences 
with largely stationary backgrounds, 
but more complex sequences require 
more sophisticated algorithms that 
perform motion compensation. [WP: 
MPEG-1#Motion_vectors] 


cones (eye): Photoreceptive cells occur- 
ring on the fovea of the human eye 
and responsible for color vision. Per- 
form best in bright light. Significantly 
fewer in number than the co-existent, 
dim-light performing rods. [Oys99] 


confocal: In optics, two lens that share 
the same focal plane or focal point (i.e., 
co-incident focal). In common usage in 
referring to confocal microscopy imag- 
ing. [Pea88:Ch. 1] 


conformal mapping: A function from 
the complex plane to itself, f : Che 
C, that preserves local angles. For 
example, the complex function y = 
sin(z) = —4i(e* — e-’*) is conformal. 
[WP:Conformal_map] 


conic: Curves arising from the inter- 
section of a cone with a plane (also 
called conic sections). This is a family 
of curves including the circle, ellipse, 
parabola and hyperbola: 


<m > 
hyperbola 


The general form for a conic in 
2D is ax? + bxy + cy + dx+ ey + 
f = 0. [JKS95:6.6] 


conic fitting: The fitting of a geometric 
model of a conic section ax? + bxy + 
cy? +dx+ey+ f = 0 toa set of data 
points {(;, y,)}. Special cases include 
fitting circles and ellipses. [JKS95:6.6] 


conic invariant: An invariant of a conic 
section. If the conic is in canonical 
form 


ax’ + bxy+cy +dx+ey+ f=0 


wih 7+0?4+C4@+e4 f7=1, 
then the two invariants to rotation and 
translation are functions of the eigen- 
values of the leading quadratic form 
ab 
bc 
trace and determinant are invariants 
that are convenient to compute. For an 
ellipse, the eigenvalues are functions 
of the radii. The only invariant to affine 
transformation is the class of the conic 
(hyperbola, ellipse, parabola etc.). The 
invariant to projective transformation 
is the set of signs of the eigenvalues 
of the 3 x 3 matrix representing the 
conic in homogeneous coordinates. 
[Wei88] 


conical mirror: A mirror in the shape of 
(possibly part of) a cone. It is particu- 
larly useful for robot navigation since 
a camera placed facing the apex of the 
cone aligning the cone’s axis and the 
optical axis and oriented towards its 
base can have a full 360° view. Conical 
mirrors were used in antiquity to pro- 
duce cipher images known as anamor- 
phoses. [PM96] 


matrix A = | | For example, the 


conjugate direction: Optimization 
scheme where a set of independent 
directions are identified on the search 
space. A pair of vectors u and U are 
conjugate with respect to matrix A 
if ù! A ¥=O0. A conjugate direction 
optimization method is one in which 
a series of optimization directions 
are devised that are conjugate with 
respect to the normal matrix but do 
not require the normal matrix in order 
for them to be determined. [Has78] 
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conjugate gradient: A basic tech- 


nique of numerical optimization in 
which the minimum of a numer- 
ical target function is found by 
iteratively descending along non- 
interfering conjugate directions. The 
conjugate gradient method does not 
require second derivatives and can 
find the optima of an N-dimensional 
quadric form in WN iterations. By 
comparison, a Newton optimization 
method requires one iteration and 
gradient descent can require an arbi- 
trarily large number of iterations. [WP: 
Conjugate_gradient_method] 


connected component labeling: 1) A 


standard graph problem. Given a graph 
consisting of nodes and arcs, the prob- 
lem is to identify nodes forming a con- 
nected set. A node is in a set if it has 
an arc connecting it to another node in 
the set. 

2) Used in binary image and gray 
scale image processing to join together 
neighboring pixels into regions. There 
are several efficient sequential algo- 
rithms for this procedure. In this fig- 
ure, the pixels in each connected 
component have a different color: 
[JKS95:2.5.2] 


x + 


connectivity: See pixel connectivity. 


conservative smoothing: A noise- 


filtering technique whose name 
derives from the fact that it employs a 
fast filtering algorithm that sacrifices 
noise suppression power to preserve 
the image detail. A simple form of 
conservative smoothing replaces a 
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pixel that is larger (or smaller) than 
its 8 connected neighbors by the 
largest (or smallest) value amongst 
those neighbors. This process works 
well with impulse noise but is not 
as effective with Gaussian noise. 
[ST99] 


constrained least squares: It is some- 
times useful to minimize ||AX — b]|> 
over some subset of possible solu- 
tions x that are predetermined. For 
example, one may already know the 
function values at certain points on 
the parameterized curve. This leads 
to an equality constrained version of 
the least squares problem, stated as: 
minimize ||AX— b]|2 subject to Bx = 
č. There are several approaches to 
the solution of this problem such as 
QR factorization and singular value 
decomposition. As an example, this 
regression technique can be useful in 
least squares fitting, where the plane 
described by x is constrained to be 
perpendicular to some other plane. 
[Hun73] 


constrained matching: A generic term 
for recognition approaches where two 
objects are compared under a con- 
straint on either or both. One exam- 
ple of this would be a search for mov- 
ing vehicles under 20 feet in length. 
[CR02] 


constrained optimization: Optimiza- 
tion of a function f subject to con- 
straints on the parameters of the func- 
tion. The general problem is to find 
the x that minimizes (or maximizes) 
fœ subject to g(x) = 0 and h(x) >= 
0, where the functions f,g,b may 
all take vector-valued arguments and 
g and 4 may also be vector-valued, 
encoding multiple constraints to be sat- 
isfied. Optimization subject to equality 
constraints is achieved by the method 
of Lagrange multipliers. Optimization 
of a quadratic form subject to equal- 
ity constraints results in a generalized 
eigensystem. Optimization of a general 
f subject to general g and b may be 
achieved by iterative methods, most 
notably sequential quadratic program- 
ming. [WP:Constraint_optimization] 


constraint satisfaction: An approach to 
problem solving that consists of three 
components: 

e alist of what “variables” need values; 

e a set of allowable values for each 
“variable”; 

e aset of relationships that must hold 
between the values for each “vari- 
able” (i.e., the constraints). 

In computer vision, this approach 

has been used for structure labeling 

(e.g., line labeling and region labeling) 

and geometric model recovery tasks 

(e.g., reverse engineering of 3D parts 

or buildings from range data). [WP: 

Constraint_satisfaction] 


constructive solid geometry (CSG): 
A method for defining 3D shapes in 
terms of a mathematically defined set 
of primitive shapes. Boolean set theo- 
retic operations of intersection, union 
and difference are used to combine 
shapes to make more complex shapes. 
For example: [JKS95:15.3.2] 


To- 


content-based image retrieval: Image 
database indexing methods that pro- 
duce matches based on the contents 
of the images in the database, as 
contrasted with using text descrip- 
tors to do the indexing. For exam- 
ple, one can use descriptors based 
on color moments to select images 
with similar invariants. [WP:Content- 
based_image_retrieval] 


context: In vision, the elements, infor- 
mation or knowledge that occur with 
or accompany some data, contributing 
to the data’s full meaning. For example, 
in a video sequence one can speak of 
the spatial context of a pixel, indicat- 
ing the intensities at surrounding loca- 
tions in a given frame (image), or of the 
temporal context, indicating the inten- 
sities at that pixel location (same coor- 
dinates) but in previous and following 
frames. Information deprived of appro- 
priate context can be ambiguous: e.g., 


differential optical flow methods can 
only estimate the normal flow; the full 
flow can be estimated considering the 
spatial context of each pixel. At the 
level of scene understanding, know- 
ing that the image data comes from a 
theater performance provides context 
information that can help distinguish 
between a real fight and a stage act. 
[DH73:2.11] 


context-aware algorithm: A range 


of techniques spanning tracking, 
visual salience, object recognition and 
object detection that use contextual 
information in addition to the spe- 
cific subject of the task in maximiz- 
ing the efficacy of the approach, e.g., 
traversable scene region awareness in 
pedestrian detection. More generally 
the use of additional sensor informa- 
tion providing context (e.g., a global 
positioning system providing location 
cues for mobile devices). [YMH09] 


context dependent: A _ process 


result or outcome achieved using 
a context-aware algorithm. [YMH09] 


contextual event: Significant changes in 


a scene that are independent of the 
objects of interest (e.g., people) and 
location specific in a visual environ- 
ment. For example, an item may be 
left in the scene, a vehicle may enter 
or park, or the street lights may turn 
on. [GX11:7.1] 


contextual image classification: Algo- 


rithms that take into account the 
source or setting of images in their 
search for features and relationships 
in the image. Often this context 
is composed of region identifiers, 
color, topology and spatial relation- 
ships as well as task-specific knowl- 
edge. [MS03] 


contextual information: Additional 


information about the image, video 
or scene being analyzed that aids 
in the evaluation of the outcome. 
Obtained from secondary sensing, 
prior annotation of the content or the 
scene background excluding the pri- 
mary target objects of interest. See 
also context-aware algorithm. [GX11: 
2.5] 
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contextual knowledge: A priori 
contextual information about a given 
image or video (e.g., camera view- 
ing angle or distance to object of 
interest). See contextual information 
[GX11:5.1.3] 


contextual method: Algorithms that 
take into account the spatial arrange- 
ment of found features in their search 
for new ones. [CG84] 


contextually incoherent: An anoma- 
lous occurrence that cannot be 
readily explained using contextual 
information or the a priori model 
in use (e.g., behavior model or 
prior distribution of occurrence). See 
anomaly detection. [GX11:Ch. 10] 


continuous convolution: The 
convolution of two continuous 
signals. When processing 2D images, 
the convolution of two images f and 
h is: gx, Y) = JX, V) @ WX, y) = 
SIS FCs TBO = Tu, Y — 
1, )dt,aty. [LWLOO] 


continuous Fourier transform: See 
Fourier transform. 


continuous learning: A general term 
describing how a system continually 
updates its model of a process based on 
current data. For example, background 
modeling (for change detection) as the 
illumination changes during the day. 
[Doy00] 


continuous random variable: If the 
cumulative distribution function of a 
random variable X is a continuous 
function, then X is said to be a con- 
tinuous random variable. In this case 
the cumulative distribution function 
can be expressed as the integral of 
the corresponding probability density 
function. Contrast with a discrete 
random variable. Examples of con- 
tinuous random variables include the 
Gaussian distribution, the multi-variate 
normal distribution and the gamma 
distribution. [GS92:2.3] 


contour: See object contour. 


contour analysis: Analysis of outlines of 
image regions. [MBLSO1] 
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contour following: See contour linking. 
contour grouping: See contour linking. 


contour length: The length of a contour 
in appropriate units of measurement. 
For instance, the length of an image 
contour in pixels. See also arc length. 
[WP:Arc_length] 


contour linking: Edge detection or 
boundary detection processes typically 
identify pixels on the boundary of 
a region. Connecting these pixels to 
form a curve is the goal of contour link- 
ing. [GPSGO1] 


contour matching: See curve matching. 


contour partitioning: See curve 
segmentation. 
contour relaxation: A relaxation 


approach to contour linking for scene 
segmentation. [SHK03] 


contour representation: See boundary 
representation. 


contour tracing: See contour linking. 
contour tracking: See contour linking. 


contourlets: A contour representation 
approach using a multi-resolution and 
multidirectional multifilter bank tech- 
nique. [DV05] 


contrast: 1) The difference in brightness 
values between two structures, such as 
regions or pixels. 
2) A texture measure. In a gray scale 
image, contrast, C, is defined as 


c=) ¢-pris 
Fay, 


where P is the gray-level co-occurrence 
matrix. [JKS95:7.2] 


contrast enhancement: Also known as 
“contrast stretching”. Expands the dis- 
tribution of intensity values in an image 
so that a larger range of sensitivity in 
the output device can be used. This 
can make subtle changes in an image 
more obvious by increasing the dis- 
played contrast between image bright- 
ness levels: 


A point-based variant on algebraic 
curves. Can also be extended to sur- 
face representation. [FDFH96:11.2] 


control strategy: The guidelines behind 
the sequence of processes per- 
formed by an automatic image analysis 
or scene understanding system. For 
instance, control can be top-down 
(searching for image data that veri- 
fies an expected target) or bottom-up 
(progressively acting on image data 
or results to derive hypotheses). The 
control strategy may allow selection 
of alternative hypotheses, processes, 
parameter values etc. [SHB08:8.1] 


convex hull: Given a set of points, S, the 
convex hull is the smallest convex set 
that contains S. The figure shows a 2D 
example: [Dav90:6.6] 


after contrast enhancement 


Histogram equalization is one method 
of contrast enhancement. [Kim97] 


contrast stretching: See 
enhancement. 


control point: Used in a parametric 
smooth curve representation system to 
specify the shape of the curve to a 
given order of complexity (e.g., lin- 
ear, quadratic or cubic). The curve 
is defined as an algebraic relationship 


contrast 


convex hull 


convexity: A property of shape associ- 


between all of the specified control 
points in the space but not necessar- 
ily intersecting them: 


ated with being convex (i.e., a shape 
outline curving outwards, such as the 
exterior of a circle or sphere hav- 


ing interior angles measuring less than 
180°. Mathematically it is defined as 


convex hull perimeter length 
‘perimeter length . See convex hull. 


Antonym of non-convexity. [SB11:9.1] 


convexity ratio: Also known as “solid- 
ity”. A measure that characterizes devi- 
ations from convexity. The ratio for 
shape X is defined as ae where 
Cx is the convex hull of X. A convex 
figure has convexity factor 1, while all 
other figures have convexity less than 


1. [MK00] 


ro) 


P 
° Control points - {P,,P,,P,,P, 
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convolution operator: A widely used 
general image processing and signal 
processing operator that computes the 
weighted sum y(j) = >, wOxG — 
where w(i) are the weights, x(7) is the 
input signal and y(J) is the result. Sim- 
ilarly, convolutions of image data take 
the form yc) =>; j; WG, Px — 
i,c — j). Similar forms using inte- 
grals exist for continuous signals and 
images. With appropriate choice of 
the weight values, convolution can 
compute low pass/smoothing, high 
pass/differentiation filtering or tem- 
plate matching/matched filtering, as 
well as many other linear functions. 
The image on the right of the figure 
is the result of convolving (and then 
inverting) the left image with a|+1|—1| 
mask: [FP03:7.1.1] 
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co-occurrence matrix: A representa- 
tion commonly used in texture analy- 
sis algorithms. It records the likelihood 
(usually empirical) of two features or 
properties being at a given position 
relative to each other. For example, 
if the center of the matrix M is posi- 
tion (a, b) then the likelihood that the 
given property is observed at an offset 
(i, j) from the current pixel is given 
by matrix value M(a + i, b + j). [WP: 
Co-occurrence_matrix] 


Cook-Torrance model: A computer 
graphics shading model for surface 
reflectance in 3D rendering. An alterna- 
tive to the common Phong reflectance 
model incorporating the physical prop- 
erties of reflectance and offering supe- 
rior reflectance rendering for rough 
and metallic surfaces. [BRO3:III.2] 


cooperative algorithm: An algorithm 
that solves a problem by a series of 
local interactions between adjacent 
structures, rather than some global 
process that has access to all data. 
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The value at a structure changes iter- 
atively in response to changing val- 
ues at the adjacent structures, such as 
pixels, lines, regions etc. The expecta- 
tion is that the process will converge 
to a good solution. The algorithms 
are well suited for massive local paral- 
lelism (e.g., single instruction multiple 
data) and are sometimes proposed 
as models for human image process- 
ing. An early algorithm to solve the 
stereo correspondence problem used 
cooperative processing between ele- 
ments representing the disparity at a 
given picture element. See also belief 
propagation. [ZK00] 


coordinate system: A spanning set of 
linearly independent vectors defining 
a vector space. One example is the 
set generally referred to as the X, Y 
and Z axes. There are, of course, an 
infinite number of sets of three lin- 
early independent vectors describing 
3D space. The right-handed version of 
this is shown in the figure: [FP03:2.1.1] 


Y 


Z X 


coordinate system transformation: A 
geometric transformation that maps 
points, vectors or other structures 
from one coordinate system to 
another. It is also used to express 
the relationship between two coordi- 
nate systems. Typical transformations 
include translation and rotation. See 
also Euclidean transformation. [WP: 
Coordinate_system#Transformations] 


coplanarity: The property of lying in the 
same plane. For example, three vectors 
å, b and ĉ are coplanar if their scalar 
triple product (@ x b) - č = 0 is zero. 
[WP:Coplanarity] 


coplanarity invariant: A projective 
invariant that allows the determina- 
tion of when five corresponding points 
observed in two (or more) views are 
coplanar in the 3D space. The five 
points allow the construction of a set 
of four collinear points whose cross 
ratio value can be computed. If the 
five points are coplanar, then the cross 
ratio value must be the same in the 
two views. In the figure, point A is 
selected and the lines AB, AC, AD and 
AE are used to define an invariant cross 
ratio for any line L that intersects them: 
[Gro94] 


copy detection: A digital image 
forensics technique for the automatic 
detection of in-image content dupli- 
cation (“copy and paste” tampering) 
and cross-media image duplication 
in the presence of changes to scale, 
compression and encoding character- 
istics: [Kim03] 


core line: See medial line. 


corner detection: See curve 


based on X-ray data) for visualiz- 
ing and inspecting the blood vessels 
surrounding the heart (coronaries). 
See also angiography. [WP:Coronary_ 
catheterization] 


correlation: The correlation between 
two numerical random variables X and 
Y is defined as 


covCx, Y) 


J varCX)var(Y) 


where cov denotes the covariance 
and var denotes the variance. It is 
a normalized form of the covariance. 
[MKB79:2.2.2] 


correlation-based optical flow esti- 
mation: Optical flow estimated by cor- 
relating local image texture at each 
point in two or more images and not- 
ing their relative movement. [CT98] 


corrCxX, Y) = 


correlation-based stereo: Dense 
reconstruction (i.e., at every pixel) 
computed by the cross correlation of 
local image neighborhoods in the two 
images to find corresponding points 
from which depth can be computed 
by stereo triangulation. [FHM+93] 


correlogram: 1) In general statistical 
analysis, a plot of the autocorrelation 
values against either distance or time. 
2) In image processing, an extension 
to the color image histogram recording 
the statistical co-occurrence of two col- 
ors 7 and j at a given distance, d, in the 
image for a given distance range, d = 
{0...2}. Elements in the correlogram 
are thus indexed (é, j, d) for the co- 
occurrence of color pair (f, j) at sep- 
aration d. See also color co-occurrence 
matrix. [CM04:15.5.1] 


correspondence-based morphing: An 
approach to mesh model (model to 
model) morphing based on point cor- 
respondences. [FBH+01] 


correspondence constraint: See 
stereo correspondence problem. 


segmentation. 


corner feature detector: See interest 


correspondence problem: See stereo 
correspondence problem. 


point feature detectors and curve 
segmentation. 


coronary angiography: A class of 
image-processing techniques (usually 


cosine diffuser: Optical correction 
mechanism for correcting spatial 
responsivity to light. Since off-angle 
light is treated with the same response 
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as normal light, a cosine transfer is used 
to decrease the relative responsivity. 
[FSSO6] 


cosine integral images: An extension 
to the concept of integral images 
targeting non-uniform filters such as, 
Gaussian smoothing, Gabor filtering 
and bilateral filtering. [EW11] 


cosine transform: Representation of a 
signal in terms of a basis of cosine func- 
tions. For a 1D even function f(x), the 
cosine transform is 


FW = af S@) cos2r ux)dx. 
0 


For a sampled signal fo 1), the dis- 
crete cosine transform is the vector 
bo.n—t Where, for k > 1: 


n—-1 
by = tay 


i=0 
n-1 
bp = iE scol cos (<@i+ Dk) 


For a 2D signal, f(x, y) the cosine 
transform F(u, v) is 


af f fŒ, y) cos(21 ux) 
o Jo 


cos(2m vy)dxdy 
[(Umb98:2.5.2] 


cost function: The function or met- 
ric quantifying the cost of a cer- 
tain action, move or configuration, 
that is to be minimized over a 
given parameter space. A key con- 
cept of optimization. See also Newton 
optimization method and functional 
optimization. [HZ00:3.2] 


coupled hidden Markov model: An 
extension to the hidden Markov model 
(HMM) approach for modeling the 
interaction of temporal processes, 
e.g., human interactions in behavior 


covariance matrix: In a d-dimensional 


random vector of X, the covariance 
between components X; and X; 
is defined as ©; = covCX;, X;) = 
E((X, — E(X DCX; — ELX, D] where 
E[X] is the expectation value of X. 
These entries can be assembled into a 
d x d symmetric covariance matrix £ 
where the diagonal elements hold the 
variances of each component. See also 
sample covariance. [MKB79:2.2.2] 


covariance propagation: A method of 


statistical error analysis, in which the 
covariance of a derived variable can 
be estimated from the covariances of 
the variables from which it is derived. 
For example, assume that independent 
variables x and ý are sampled from 
multi-variate normal distributions with 
associated covariance matrices C, and 
C,. The covariance of the derived vari- 
able Z = ax + by is C; = a*C, + Cy. 
[HZ00:4.2] 


crack code: A contour description 


method that codes not the pixels 
themselves but the cracks between 
them. This is done as a four-directional 
scheme: 


crack code = { 2, 2, 1,2,3,2 } 


It can be viewed as a chain code with 
four directions rather than eight. [WP: 
Chain_code] 


crack detection: A visual industrial 


inspection for the detection of crack 


analysis, each itself modeled by an 
HMM. [GX11:3.3.1] 


covariance: The covariance between 
two numerical random variables X and 
Y is defined as covCX, Y) = E[(X — 
E(XD(Y — E[Y]))] where £[-] denotes 
expectation. Note that covCX, X) is the 
variance of X. [MKB79:2.2.2] 
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faults in manufactured articles. Also 
commonly applied to in-situ inspec- 
tion of load bearing engineering struc- 
tures or leak detection in pipework. 
[Dav90:4.4.2] 


crack edge: A type of edge used 


in line-labeling research to represent 
where two aligned blocks meet. In the 


figure, neither a step edge nor a fold 
edge is seen: [Yak76] 


eee 


~~... CRACK EDGE 


crack following: Edge tracking on the 
dual lattice or “cracks” between pixels 
based on the continuous segments of 
line from a crack code. [MB99] 


Crimmins smoothing operator: 


An iterative algorithm for speckle 
(salt-and-pepper noise) reduction. 
It uses a nonlinear noise reduction 
technique that compares the intensity 
of each image pixel with its eight 
neighbors and either increments 
or decrements the value to try and 
make it more representative of its 
surroundings. The algorithm raises the 
intensity of pixels that are darker than 
their neighbors and lowers pixels that 
are relatively brighter. More iterations 
produce more reduction in noise but 
at the cost of increased blurring of 
detail. [Cri85] 


critical motion: In the problem of self- 


calibration of a moving camera, there 
are certain motions for which cali- 
bration algorithms fail to give unique 
solutions. Sequences for which self- 
calibration is not possible are known 
as “critical motion” sequences. [Stu97] 


cross correlation: Standard method 


of estimating the degree to which 
two series are correlated. Given 
two series {x;} and {y,}, where 7 = 
0, 1, 2,..., (N — 1), the cross correla- 
tion, rą, at a delay d is defined as 


L — my).Qi-a — mM) 
VG -MYSZ Oi-a — MP 
where m, and m, are the means 


of the corresponding sequences. 
[Hec87:11.3.4] 


cross-correlation matching: Matching 


based on the cross correlation of two 
sets. The closer the correlation is to 
1, the better the match is. For exam- 
ple, in correlation-based stereo, for 


each pixel in the first image, the cor- 
responding pixel in the second image 
is the one with the highest correlation 
score, where the sets being matched 
are the local neighborhoods of each 
pixel. [NA05:5.3.1] 


cross ratio: The simplest projective 
invariant. It generates a scalar from 
four points of any 1D projective space 
(e.g., a projective line). The cross ratio 
for the points ABCD in the figure is: 


r+s\st+b 
sr+s+b 


[FP03:13.1] 
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cross-section function: Part of the 
generalized cylinder representation 
that gives a volumetric representa- 
tion of an object. The representation 
defines the volume by a curved axis, a 
cross section and a cross-section func- 
tion at each point on that axis. The 
cross-section function defines how the 
size or shape of the cross section varies 
as a function of its position along the 
axis. See also generalized cone. The fig- 
ure shows how the size of the square 
cross section varies along a straight 
line to create a truncated pyramid: 
[PCM89] 


CROSS AXIS CROSS SECTION 
SECTION FUNCTION 
TRUNCATED 
PYRAMID 


cross-validation: When comparing dif- 
ferent models, one cannot simply 
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compare their performance on the cubic spline: A spline where the weight 
training set, because of the danger functions are third-order polynomials. 
of over-fitting. This can be guarded [PTVF92:Ch. 3] 

against using a validation set, but if the 
data available for training and valida- 
tion is limited, it can be more efficient 
to use cross-validation. In this scheme 
the data is divided into k folds, and for 
each fold i = 1,..., k training is car- 
ried out on all folds except the 7th 
and validation is carried out on the 7th. 
The validation results from each fold cumulative abnormality score: A mea- 


cuboid descriptor: A localized 
3D spatio-temporal space feature 
descriptor describing a region within 
a video sequence, akin to the use 
of region descriptors for 2D images. 
May also be applied to an object in an 
explicit 3D voxel space. [GX11:5.4.1] 


are then averaged to obtain an overall surement of abnormality used in 
performance estimate. A leave-one-out behavior analysis, given an a pri- 
test is a special case when R is equal ori behavior model, that alleviates 
to the number of training examples. the effect of noise by accumulating 
[HTF08:7.10] the temporal history of the likelihood 


of behavioral anomaly occurrences in 
each region over time. Implemented 
with a threshold to control anomalous 
behavior detection alerts in the pres- 
ence of noise (with additional tempo- 
ral decay for occurrences below that 
threshold). [GX11:15.2.4] 


cumulative anomaly score: See 
cumulative abnormality score. 


crossing number: The crossing number 
of a graph is the minimum number of 
arc of graph intersections in any draw- 
ing of that graph. A planar graph has 
a crossing number of zero. The figure 
shows a graph with a crossing number 
of one: [Dav90:6.8.1] 


cumulative distribution function: The 

A cumulative distribution function of a 

random variable X is defined as F(x) = 

Pr(X < x), i.e., the probability that X 

does not exceed the value x. The 

b 7 a range of F is [0, 1]. This definition also 

applies to vector-valued random vari- 
ables. [MKB79:2.1] 


cumulative histogram: A histogram 


crowd flow analysis: Using motion where the bin contains not only the 
properties, (e.g., the optical flow field) count of all instances having that 
computed from (human) crowd video value but also the count of all bins 
sequences for behavior analysis tasks, having a lower index value. This is 
such as anomalous behavior detection: the discrete equivalent of the cumu- 
[ABF06] lative probability distribution. The fig- 


ure on the right is the cumula- 
tive histogram corresponding to the 


| Simulated Crowd normal histogram on the left: [WP: 


Histogram#Cumulative_histogram] 


a 


A 
RON 


CSG: See constructive solid geometry 
CT: See X-ray CAT. [WP:X-ray_ 
computed_tomography] 
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cumulative scene vector: A vector 


representation of temporally accumu- 
lated activity classification output for 
a given scene (one activity class per 
vector dimension). Used to overcome 
the problem that small periods of 
scene inactivity cause semantic break- 
down in the analysis of per video 
frame scene vector measurements for 
behavior analysis. [GX11:7.2.1] 


currency verification: Algorithms for 


checking that printed money and 
coinage are genuine. A special- 
ist field involving optical character 
recognition. [FGP96] 


curse of dimensionality: The exponen- 


tial growth of possibilities as a func- 

tion of dimensionality. This might man- 

ifest several effects as the dimensional- 
ity increases: 

e an increased amount of computa- 
tional effort required; 

e an exponentially increasing amount 
of data required to populate the 
data space in order that training 
works; 

e all data points tending to become 
equidistant from each other. 

This causes problems for clustering 

and machine learning algorithms. [WP: 

Curse_of_dimensionality] 


cursive script recognition: Meth- 


ods of optical character recognition 
whereby hand-written cursive (also 
called joined-up) characters are auto- 
matically classified. [BM02:5.2] 


curvature: Usually meant to refer to 


the change in shape of a curve or 
surface. Mathematically, the curvature 
k of a curve is the length of the sec- 
ond derivative | ats | of the curve 
X(s) parameterized as a function of 
arc length s. A related definition holds 
for surfaces, only here there are two 
distinct principal curvatures at each 
point on a sufficiently smooth surface. 
[NA05:4.6] 


curvature primal sketch: A multi-scale 
representation of the significant 
changes in curvature along a planar 
curve. [NA05:4.8] 


curvature scale space: A multi-scale 


representation of the curvature zero- 
crossing points of a planar object 
contour as it evolves during smooth- 
ing. It is found by parameterizing 
the contour using arc length, which 
is then convolved with a Gaussian 
smoothing of increasing standard devi- 
ation. Curvature zero-crossing points 
are then recovered and mapped to 
the scale-space representation of the 
image with the horizontal axis repre- 
senting the arc length parameter on the 
original contour and the vertical axis 
representing the standard deviation 
of the Gaussian filter. [WP:Curvature_ 
Scale_Space] 


curvature sign patch classification: 


A method of local surface classifi- 
cation based on its mean curvature 
and Gaussian curvature signs, or 
principal curvature sign class. See also 
mean and Gaussian curvature shape 
classification. [HJ87] 


curve: A set of connected points in two 


or three dimensions, where each point 
has at most two neighbors. The curve 
could be defined by a set of connected 
points, by an implicit function (e.g., 
y+ xX? = 0), by an explicit form (e.g., 
(t, —t?) for all 4 or by the intersec- 
tion of two surfaces (e.g., by inter- 
secting the planes X = 0 and Y = 0). 
[NA05:4.6.2] 


curve binormal: The vector perpendic- 


ular to both the curve tangent vector 
and curve normal vector at any given 
point: [Mor95] 


BINORMAL 
TANGENT 


NORMAL 


curve bitangent: A line tangent to 


a curve or surface at two different 
points: [WP:Bitangent] 
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curve evolution: An abstraction method 
whereby a curve can be iteratively sim- 
plified, as in this figure: 


evolution 
stage 


For example, a relevance measure is 
assigned to every vertex in the curve. 
The least important can be removed 
at each iteration by directly connect- 
ing its neighbors. This elimination 
is repeated until the desired stage 
of abstraction is reached. Another 
method of curve evolution is progres- 
sive curve smoothing with Gaussian 
smoothing of increasing standard devi- 
ation. [TYW01] 


curve fitting: Methods for finding the 
parameters of a best-fit curve through 
a set of 2D (or 3D) data points. 
This is often posed as a minimiza- 
tion of the least-squares error between 
some hypothesized curve and the data 
points. If the curve, (x), can be 
thought of as the sum of a set of m 
arbitrary basis functions, Xę and writ- 
ten 


k=m 


VO) = > Ak Xr) 


k=1 
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then the unknown parameters are the 
weights ag. The curve-fitting process 
can then be considered as the mini- 
mization of some log-likelihood func- 
tion giving the best fit to N points 
whose Gaussian error has standard 
deviation o;. This function may be 
defined as 


The weights that minimize this can be 
found from the design matrix D 


X; xp 

Ad t 

D; j = mo 
Oi 


by finding the solution to the linear 
equation 


Da=r 


where the vector r; = “. [NA05:4.6.2] 


curve inflection: A point on a curve 


where the curvature is zero as it 
changes sign from positive to negative: 
[FP03:19.1.1] 
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curve invariant: Measures taken over a 


curve that remain invariant under cer- 
tain transformations, e.g., arc length 
and curvature are invariant under 
Euclidean transformations. [HC96] 


curve invariant point: A point on 


a curve that has a geometric prop- 
erty that is invariant to changes 
in projective transformation. Subse- 
quently, the point can be identified and 
used for correspondence in multiple 
views of the same scene. Two well- 
known planar curve invariant points 
are curvature inflection points and 
bitangent points: [LSW88] 
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curve matching: The comparison of 
data sets to previously modeled curves 
or other curve data sets. If a modeled 
curve closely corresponds to a data 
set then an interpretation of similarity 
can be made. Curve matching differs 
from curve fitting in that curve fitting 
involves minimizing the parameters of 
theoretical models rather than actual 
examples. [Wol90] 


curve normal: The vector perpendicu- 
lar to the tangent vector to a curve at 
any given point, which also lies in the 
plane that locally contains the curve at 
that point: [PHO3] 
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curve representation system: Meth- 
ods of representing or modeling curves 
parametrically. Examples include: 
b-splines, crack codes, cross-section 
functions, Fourier shape descriptors, 
intrinsic equations, polycurves, 
polygonal approximations, radius 
vector functions, snakes and splines. 
[JKS95:6.1] 


curve saliency: A voting method for 
the detection of curves in a 2D or 3D 
image. Each pixel is convolved with 
a curve mask to build a saliency map. 
This map will hold high values for loca- 
tions in space where likely candidates 
for curves exist. [JT03] 


curve segmentation: Methods of iden- 
tifying and splitting curves into prim- 
itive types. The location of changes 
between one primitive type and 


another is particularly important. For 
example, a good curve segmenta- 
tion algorithm should detect the four 
lines that make up a square. Methods 
include corner detection, Lowe’s curve 
segmentation and recursive splitting. 


[KLP94] 


cutve smoothing: Methods for round- 


ing polygon approximations or vertex- 
based approximations of surface 
boundaries. Examples include b-spline 
in 2D and NURBS in 3D. See also curve 


evolution. The figure shows a polygo- 


nal data curve smoothed by a Bezier 
curve: [Oli93] 


data curve 
smoothed curve 
(Bezier) 


curve tangent vector: The vector that 


is instantaneously parallel to a curve at 
any given point: [WP:Tangent_vector] 
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ee 
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curvelet: An extension of the wavelet 


concept commonly known as the 
“curvelet transform”, forming a non- 
adaptive technique for multi-scale 
object representation specifically tar- 
geted at the shortcomings of wavelet 
approaches in representing continu- 
ous curves and contours. [SCD02] 


cut detection: The identification of the 


frames in film or video where the cam- 
era viewpoint suddenly changes, either 
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to a new viewpoint within the cur- 
rent scene or to a new scene. [WP: 
Shot_transition_detection] 


cutset sampling: A variant on the 
sampling methodology for Bayesian 
networks that samples only a subset 
of variables in the network and applies 
exact inference to the rest. [BD07] 


Cyclopean view: A term used in 
binocular stereo image analysis, based 
on the mythical one-eyed Cyclops. 
When stereo triangulation of a scene 
occurs based on two cameras, one 
has to consider on which coordi- 
nate system to base the reconstructed 
3D coordinates or which viewpoint 
to use when presenting the recon- 
struction. The Cyclopean viewpoint is 
located at the midpoint of the baseline 
between the two cameras. [CSBT03] 


cylinder extraction: Methods of identi- 
fying the cylinders and the constituent 
data points from 2.5D and 3D images 
that are samples from 3D cylinders. 
[MDN97] 
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cylinder patch extraction: Given a 
range image or a set of 3D data points, 
cylinder patch extraction finds sets 
of points (usually connected) that lie 
on the surface of a cylinder and usu- 
ally also the equation of that cylinder. 
This process is useful for detecting and 
modeling pipework in range images of 
industrial scenes. [FFE97] 


cylindrical mosaic: A mosaicing 
approach where individual 2D images 
are projected onto a cylinder. This 
is possible only when the camera 
rotates about a single axis or the 
camera center of projection remains 
approximately fixed with respect 
to the distance to the nearest scene 
points. [SKG+98] 


cylindrical surface region: A region of 
a surface that is locally cylindrical. A 
region in which all points have zero 
Gaussian curvature and nonzero mean 
curvature. [WB94] 


darkfield illumination: A specialized 


technique that uses oblique illumina- 
tion to enhance contrast in subjects 
that are not imaged well under nor- 
mal illumination conditions. [Gal90: 
2.1.1] 


data association: In a tracking problem, 


the task of determining which obser- 
vations are informative and which are 
not. [FP03:17.4] 


data augmentation: The introduction 


of unobserved data (or latent or aux- 
iliary variables) to the original data. 
This strategy is used in the expectation 
maximization (EM) algorithm for maxi- 
mizing a likelihood function and in the 
Markov chain Monte Carlo methods. 
[HTF08:8.5.2] 


data fusion: See sensor fusion. [WP: 
Data_fusion] 


because the same operation can be 
applied independently and in paral- 
lel at all pixels in the image. [Sch89: 
Ch. 8] 


data reduction: A general term for pro- 


cesses that reduce the number of data 
points (e.g., by subsampling, using 
a cluster center of mass as a rep- 
resentative point or decimation) or 
the number of dimensions in each 
data point (e.g., by projection or 
principal component analysis). [WP: 
Data_reduction] 


data structure: A fundamental concept 


in programming: a collection of com- 
puter data organized in a precise struc- 
ture, e.g., a tree (see quadtree), a queue 
ora stack. Data structures are accompa- 
nied by sets of procedures or libraries 
that implement various types of data 
manipulation, e.g., storage and index- 


data integration: See sensor fusion. 
[WP:Data_integration] 


ing. [WP:Data_structure] 


DCT: See discrete cosine transform. 


data mining: The process of extract- 


ing useful information from large data deblocking filter: A filter applied in 


sets. This information may be descrip- 
tive (providing understanding of the 
structure or patterns in the data) 
or predictive (prediction of one or 
more variables based on others). The 
emphasis is on enabling humans to 
learn underlying structural patterns or 
behaviors from the data rather than 
on autonomous machine performance. 
[Mur12:1.1.1] 


data parallelism: Reference to the 


parallel structuring of the input to 
programs, the organization of pro- 
grams or the programming language 
used. Data parallelism is a useful 
model for much image processing 


compressive video decoding when 
block coding has been used to 
act as an appearance enhancement 
transform removing any macro-scale 
blocking artifacts. Used in decod- 
ing a number of compressive video 
encoding schemes including H.263 
and the H.264 streaming video formats. 
[LJL+03] 


deblur: To remove the effect of a known 


blurring function on an image. If an 
observed image J is the convolution 
of an unknown image J’ and a known 
blurring kernel B, so that J = T x 
B, then deblurring is the process of 
estimating I’ given I and B. See 


Dictionary of Computer Vision and Image Processing, Second Edition. 
R. B. Fisher, T. P. Breckon, K. Dawson-Howe, A. Fitzgibbon, C. Robertson, E. Trucco and C. K. I. Williams. 
© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. 


deconvolution, image restoration and 
Wiener filtering. [BLM90] 


decay factor: The rate of decrease in 


a variable. This rate may decrease as 
a function with respect to another 
variable, e.g., varying exponentially 
with time. In a discrete system with 
parameter X, X%41=ax, a<1 is 
the decay factor over timestep, t. 
[HS98] 


decentering distortion (lens): A com- 


mon cause of tangential distortion. It 
arises when the lens elements are not 
perfectly aligned and creates an asym- 
metric component to the distortion. 
[WP:Distortion_Coptics)#Software_ 
correction] 


decimation: 1) In digital signal pro- 
cessing, a filter that keeps one sam- 
ple out of every N, where N is a 
fixed number. See also subsampling. 
2) “Mesh” decimation: merging of sim- 
ilar adjacent surface patches or surface 
mesh vertices in order to reduce the 
size of a model. Often used as a 
processing step when deriving a sur- 
face model from a range image. [WP: 
Downsampling] 


decision boundary: A classifier taking 
as input a vector x in R? divides 
this space into a number of regions 
and all points in a given (decision) 
region are assigned to a particular 
class. The boundaries between these 
regions are known as decision bound- 
aries. [Bis06:1.5.1] 


decision forest: A generalized term 


encompassing variations on the 
random forest ensemble learning 
concept. [Cril1] 


decision tree: A tool for helping to 
choose between several courses of 
action. An effective structure within 
which an agent can search options 
and investigate possible outcomes. It 
also helps to balance the risks and 
rewards associated with each possi- 
ble course of action. A specific exam- 
ple is a tree classifier, where the 
result of a sequence of decisions is 
the assignment of an input x to a 
label residing at the attained leaf: 
[HTFO08:9.2] 
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(2) Decisions 
© Results 


decoding: Converting a signal that has 


been encoded back into its original 
form Cossless coding) or into a form 
close to the original Cossy coding). 
See also image compression. [WP: 
Decoding] 


decomposable filter: A complex filter 


that can be applied as a number of 
simpler filters applied one after the 
other. For example the 2D Laplacian 
of Gaussian filter can be decomposed 
into four simpler filters. [PY0O1] 


deconvolution: The inverse process of 


convolution. Deconvolution is used to 
remove certain signals (e.g., blurring) 
from images by inverse filtering (see 
deblur). For a convolution producing 
image b = f x g + n given f and g (the 
image and convolution mask), where n 
is the noise and x is the convolution, 
deconvolution attempts to estimate /. 
Deconvolution is often an ill-posed 
problem and may not have a unique 
solution. See also image restoration. 
[Low91:14.5] 


defocus: Blurring of an image, either 


accidental or deliberate, by incor- 
rect use or estimation of focus or 
viewpoint parameters. See also depth 
from defocus and shape from defocus. 
[Hor86:6.10] 


defocus blur: Deformation of an image 


because of the predictable behavior 
of optics when incorrectly adjusted. 
The blurring is the result of light rays 
that, after entering the optical system, 
misconverge on the imaging plane. If 
the camera parameters are known in 


advance, the blurring can be partially 
corrected. [Hor86:6.10] 


defogging: Physics-based image restora- 


tion techniques which seek to remove 
the optical effects of light transmission 
through air (atmosphere) that is heav- 
ily saturated with water droplets (i.e., 
fog) by some form of contrast enhance- 
ment. Areas of the image are generally 
found to be degraded as a function of 
depth. [NN02] 


deformable model: Object descrip- 
tors that model a specific class of 
deformable objects (e.g., eyes and 
hands) where the shapes vary accord- 
ing to the values of the parame- 
ters. If the general, but not specific, 
characteristics of an object type are 
known, a deformable model can be 
constructed and used as a matching 
template for new data. The degree 
of deformation needed to match the 
shape can be used as a matching score. 
See also modal deformable model and 
geometric deformable model. [WP: 
Active_contour_model] 


deformable shape: See deformable 
model. 


deformable shape registration: A 


form of registration between two 
object shape boundaries that allows a 
relaxed constraint that those bound- 
aries may be deformed via a non-affine 
transformation, from one boundary to 
the other to achieve alignment. Com- 
monly used between sequential video 
frames to perform deformable object 
tracking. See also non-rigid registration 
and elastic registration. Contrasts with 
affine registration. 


deformable superquadric: A type of 


superquadric volumetric model that 
can be deformed by bending, twisting 
etc. in order to fit to the data being 
modeled. [TM90] 


deformable template model: See 


deformable model. 


deformation energy: The metric that 


must be minimized when determin- 
ing an active shape model. Com- 
posed of terms for both internal 
energy (or force), arising from 
the model shape deformation, and 


external energy (or force), arising from 
the discrepancy between the model 
shape and the data. [WP:Internal_ 
energy#Description_and_definition] 


degradation: A loss of quality suffered 


by an image, the content of which 
is corrupted by unwanted processes. 
For instance, MPEG compression- 
decompression can alter some inten- 
sities, so that the image is degraded. 
See also JPEG image compression 
and image noise. [WP:Degradation_ 
(telecommunications)] 


degree of freedom: A free variable in a 


given function. For instance, rotations 
in 3D space depend on three angles, so 
that a rotation matrix has nine entries 
but only three degrees of freedom. 
[Nal93:3.1.3] 


dehazing: Image restoration techniques 


that seek to remove various optical 
effects that have reduced the sharp- 
ness of detail in an image. See also 
defogging. [NNO2] 


Delaunay triangulation: The Delaunay 


graph of the point set can be con- 
structed from its Voronoi diagram by 
connecting the points in adjacent poly- 
gons. The connections form the Delau- 
nay triangulation, which has the prop- 
erty that the circumcircle of every 
triangle contains no other points. The 
approach can be used to construct 
a polyhedral surface approximation 
from a set of 3D sample points. In the 
figure, the solid lines connecting the 
points are the Delaunay triangulation 
and the dashed lines are the boundaries 
of the Voronoi diagram: [Fau93:10.4.4] 
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demon: A program that runs in the 
background, e.g., performing checks 
or guaranteeing the correct function- 
ing of a module of a complex system. 
[WP:Daemon_(computing)] 


demosaicing: The process of convert- 
ing a one-color-per-pixel image (as cap- 
tured by most digital cameras) into 
a three-color-per-pixel image. [WP: 
Demosaicing] 


Dempster-Shafer: A belief-modeling 
approach for testing a hypothesis that 
allows information, in the form of 
beliefs, to be combined into a plausibil- 
ity measure for that hypothesis. [WP: 
Dempster-Shafer_theory] 


dendrogram: A hierarchical visual rep- 
resentation of relationships, e.g., fea- 
ture correlation or similarity, as a tree 
structure with the leaf nodes all at the 
same depth: [EKSX96] 


denoising: The removal of either struc- 
tured (coherent) or random noise 
from a signal. In image processing, 
denoising approaches are generally 
based on intensity domain convolution 
or time-frequency domain. See image 
denoising. [SB11:4.4] 


dense reconstruction: A class of tech- 
niques estimating depth at each pixel 
of an input image or sequence, thus 
generating a dense sampling of the 3D 
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surfaces imaged. This can be achieved, 
e.g., by a range sensor or stereo vision. 
[ND 10] 


dense stereo matching: A class of meth- 
ods establishing the correspondence 
(see stereo correspondence problem) 
between all pixels in a stereo pair of 
images. The generated disparity map 
can then be used for depth estimation. 
[KLCLO5] 


densitometry: A class of techniques that 
estimate the density of a material from 
images, e.g., bone density in the medi- 
cal domain (bone densitometry). [WP: 
Densitometry] 


depth: Distance of scene points from 
the camera center or the camera imag- 
ing plane. In a range image, the inten- 
sity value in the image is a measure of 
depth. [JKS95:13.1] 


depth distortion: Systematically incor- 
rect distance estimates, and conse- 
quently object shape characteristics, 
that arise from errors primarily in the 
intrinsic parameters. [WHR99] 


depth estimation: The process of esti- 
mating the distance between a sen- 
sor (e.g., a stereo pair) and a part of 
the scene being imaged. Stereo vision 
and range sensing are two well-known 
ways to estimate depth. [TO02] 


depth from defocus: A method of 
deriving the depth from parame- 
ters that can be directly measured, 
using the direct relationships among 
the depth, camera parameters and 
the amount of blurring in images. 
[SS94] 


depth from focus: A method of deter- 
mining distance to a point by tak- 
ing many images in better and better 
focus. This is also called “autofocus” 
or “software focus”. [WP:Depth_of_ 
focus] 


depth gradient image: An image 
(ai, j), 4;@, P)) which is the gradient 
of a depth image d(, j). 

depth image: See range image. 

depth image edge detector: See range 
image edge detector. 


depth map: See range image. 


depth of field: The distance between 
the nearest and the farthest points in 
focus for a given camera: [JKS95:8.3] 


Nearest point Furthest point 
in focus in focus 
Camera (3 ° 


Depth of field 


depth of focus: The distance between 
the nearest and farthest objects in an 
image that appear to be in focus. Depth 
of focus is affected by the lens aper- 
ture, specified by (number. See also 
depth of field. [Sze10:2.2.3]. 


depth perception: The ability to per- 
ceive distances from visual stimuli, 
e.g., motion or stereo vision: [WP: 
Depth_perception] 


3D model 


View! | View 2 


depth sensor: See range sensor. 


Deriche edge detector: Convolution fil- 
ter for edge finding, similar to the 
Canny edge detector. Deriche uses 
a different optimal operator where 
the filter is assumed to have infinite 
extent. The resulting convolution fil- 
ter is sharper than the derivative of the 
Gaussian that Canny uses: 


fœ = Axe 


See also edge detection. [Der87] 


derivative-based search: Numerical 
optimization methods assuming 
that the gradient can be estimated. 
An example is the quasi-Newton 


approach, which attempts to generate 
an estimate of the inverse Hessian 
matrix. This is then used to determine 
the next iteration point: [DM77] 


Start 


Conjugate gradient search 


descattering: Algorithmic approaches 
to the removal of image attenuation 
from back-scattering often caused by 
imaging through media such as fog or 
water. Such algorithms often use light 
polarization as a first step and scene 
reconstruction afterwards. [CLFS07] 


description coding: Coding technique 
that splits media into several streams 
which are broadcast simultaneously 
and reconstructed in parallel at the 
receiver end. [Goy01] 


descriptor: See image descriptor and 
shape descriptor. 


detection: 1) Identifying the presence of 
a signal from noise using a system of 
detectors either in software or hard- 
ware. 
2) Identifying the presence of an object 
or features in an image. See also 
object detection and feature detection. 
[Dav90:Ch. 11-12] 


detection rate: 1) The speed of 
detection, measured in hertz. 
2) The success rate of detection as a 
statistic in object detection and feature 
detection as it relates to the true 
positive rate of detection for a given 
instance. [Sze10:Ch. 4; 14.1] 


DFT: See discrete Fourier transform. 
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diagram analysis: Syntactic analysis of 
images of line drawings, possibly with 
text in a report or other document. 
This field is closely related to the anal- 
ysis of visual languages. [Nag00] 


dichroic filter: A filter that selectively 
transmits light of a given wavelength. 
[WP:Dichroic_filter] 


dichromatic model: A model that states 
that the light reflected from a sur- 
face is the sum of two compo- 
nents: body and interface reflectance. 
Body reflectance follows Lambert’s 
law; interface reflectance models high- 
lights. The model has been applied to 
several computer vision tasks includ- 
ing color constancy, shape recovery 
and color image segmentation. See also 
color. [NN99] 


diffeomorphism: A differentiable one- 
to-one map between manifolds. The 
map has a differentiable inverse. [WP: 
Diffeomorphism] 


difference image: An image computed 
as the pixelwise difference of two 
images; each pixel in the difference 
image is the difference between the 
pixels at the same location in the two 
input images. In the figure, the image 
on the right is the difference of the left 
and middle images (after adding 128 
for display purposes): [Sch89:Ch. 5] 


difference-of-Gaussians operator: A 
convolution operator used to locate 
edges in a gray-scale image using an 
approximation to the Laplacian of 
Gaussian operator. In 2D, the convo- 
lution mask is: 


(- oy) 52) 
o? o2 
cye 1 — Cë 2 


where the constants cı and c2 control 
the height of the individual Gaussians 
and o1, 02 are the standard deviations. 
[CS09:4.5.4] 


differential geometry: A field of 
mathematics that studies the local 
derivative-based properties of curves 
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and surfaces, e.g., tangent plane and 
curvature. [TV98:A.5] 


differential invariant: Image descrip- 
tors that are invariant under geometric 
transformations. Invariant descriptors 
are generally classified as global invari- 
ants (corresponding to object prim- 
itives) and local invariants (typically 
based on derivatives of the image func- 
tion). The image function is always 
assumed to be continuous and differ- 
entiable. [WP:Differential_invariant] 


differential pulse code modulation: A 
technique for converting an analog sig- 
nal to binary by sampling it, express- 
ing the value of the sampled data mod- 
ulation in binary and then reducing 
the bit rate by taking account of the 
fact that consecutive samples do not 
change much. [Jai89:11.3] 


differentiation filtering: See gradient 
filter. 


diffraction: The bending of light rays 
at the edge of an object or through a 
transparent medium. The amount by 
which a ray is bent is dependent on 
wavelength. [Nal93:2.1.4] 


diffraction grating: An array of diffract- 
ing elements that has the effect of 
producing periodic alterations in a 
wave’s phase, amplitude or both. The 
simplest arrangement is an array of 
slits (see moiré interferometry): [WP: 


Diffraction_grating] 
m=2 
m=! 
| | m=0 
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diffraction limit: The fundamental max- 
imum image resolution of an imaging 
or optical system. This limit is affected 
by the size of the objective lens and the 
observed wavelength. [Hec87:p. 148] 


diffuse illumination: Light energy that 
comes from a multitude of directions, 


hence not causing significant shad- 
ing or shadow effects. The oppo- 
site of diffuse illumination is directed 
illumination. [CF90] 


diffuse reflection: Scattering of light by 
a surface in many directions: 


Reflected Light 


Ideal Lambertian surface diffusion 


results in the same energy being 
reflected in every direction regardless 
of the direction of the incoming light 
energy. [WP:Diffuse_reflection] 


diffusion filtering: Image denoising 
approach based on nonlinear evolution 
partial differential equations which 
seeks to improve images qualitatively 
by removing noise while preserving 
details and even enhancing edges. See 
also anisotropic filtering. [PM90] 


diffusion MRI tractography: In vivo 
magnetic resonance imaging that ana- 
lyzes the diffusion tensor of the return 
signal to infer the connectivity in the 
brain neural tracts. [Jon11] 


diffusion smoothing: A technique 
achieving Gaussian smoothing as the 
solution of a diffusion equation with 
the image to be filtered as the ini- 
tial boundary condition. The advan- 
tage is that, unlike repeated averaging, 
diffusion smoothing allows the con- 
struction of a continuous scale space. 
[CLMC92] 


diffusion tensor imaging: Magnetic 
resonance imaging resolution 
enhancement regime that extracts 
the return signal anisotropic diffusion 
tensor and uses it to infer information 
about tissue direction and connectiv- 
ity. See also diffusion MRI tractography 
Jon11] 


digital camera: A camera in which 
the image sensing surface is made up 
of individual semiconductor sampling 
elements (typically one per pixel of 


the image); quantized versions of the 
sensed values are recorded when 
an image is captured. [WP:Digital_ 
camera] 


digital elevation map: A sampled and 
quantized map where every point 
represents a height above a refer- 
ence ground plane (i.e., the elevation): 
[OM84] 


wee 


digital geometry: Geometry (points, 
lines, angles, surfaces etc.) ina sampled 
and quantized domain. [WP:Digital_ 
geometry] 


digital image: Any sampled and quan- 
tized image: [Umb98:1.7] 
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digital image forensics: 1) Methods 
of detecting specific forms of tamper- 
ing or unauthorized changes to digi- 
tal images for content verification. See 
digital image watermarking. 
2) Image analysis methods used in 
the collection of evidence for criminal 
investigation. [SM12] 


digital image processing: Image 
processing restricted to the domain 
of digital images. [WP:Digital_image_ 
processing] 


digital image watermarking: Embed- 
ding a code (the watermark) into the 
data of an image. The watermark acts 
as a digital signature, identifying the 
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image’s ownership or authenticity. It 
may be visible or invisible (hidden 
watermarking) to the viewer. See also 
image steganography. [WP:Digital_ 


watermarking] 
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" i J 
A Nawal 
es 
b Mell AST 
image with digital 
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digital panoramic radiography: Den- 
tal wraparound digital X-ray imaging 
system which can image the upper 
and lower jaws, sinuses and nasal cay- 
ity. See also digital radiography. [WP: 
Panoramic_radiograph] 


digital radiography: X-ray imaging 
where digital sensors replace photo- 
graphic film or plates. [WP:Digital_ 
radiography] 


digital signal processor: A class of co- 
processors designed to execute pro- 
cessing operations on digitized signals 
efficiently. A common characteristic is 
the provision of a fast multiply and 
accumulate function, e.g., a < a+ 
b x c. [WP:Digital_signal_processor] 


digital subtraction angiography: A 
basic technique used in medical 
image processing to detect, visualize 
and inspect blood vessels, based on 
the subtraction of a background image 
from the target image, usually where 
the blood vessels are made more visi- 
ble by using an X-ray contrast medium. 
See also medical image registration. 
[WP:Digital_subtraction_angiography] 


digital terrain map: See 
elevation map. 


digital 


digital topology: Topology (i.e., how 
things are connected or arranged) ina 
digital domain (e.g., in a digital image). 


done for copyright protection. The dig- 
ital watermark may be invisible or visi- 
ble: [WP:Digital_watermarking] 


Water 


mark 


digitally reconstructed radiograph 
(DDR): The approximation of an X- 
ray image from the equivalent CT data. 
[MBG+00] 


digitization: The process of making 
a sampled digital version of some 
analog signal (such as an image). 
[WP: Digitizing] 

dihedral edge: The edge made by two 
planar surfaces. A “fold” in a surface: 
[HH89] 


dilate operator: The operation of 


expanding a binary or gray-scale object 
with respect to the background. This 
has the effect of filling in any small 
holes in the object and joining any 
object regions that are close together: 


Most frequently described as a 
morphological transformation. 


See also connectivity. [WP:Digital_ 
topology] 

digital watermarking: The process of 
embedding a signature (a watermark) 
into digital data. In the domain of 
digital images this is most normally 
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The dual of the erode operator. 
(Umb98:2.4.6] 


dimensionality: The number of dimen- 
sions that need to be considered. For 
example, 3D object location is often 
considered as a seven-dimensional 


problem (three dimensions for posi- 
tion, three for orientation and one for 
the object scale). See also intrinsic 
dimensionality. [SQ04:18.3.2] 


dimensionality reduction: If high- 


dimensional data has manifold struc- 
ture, then it can be approximated in 
a lower-dimensional space. Principal 
component analysis (PCA) is the 
standard linear method for this 
task, but there are also many non- 
linear methods including Gaussian 
process latent variable model, isomap, 
kernel principal component analysis, 
Kohonen networks, and locally linear 
embedding. [Mur12:1.4.2] 


direct least square fitting: Direct fitting 


of a model to some data by a method 
that has a closed form or globally con- 
vergent solution. [FPF99] 


directed acyclic graph (DAG): A graph 


containing only directed edges. If there 
is an edge between a pair of vertices u 
and v it must be either u —> vor v —> u, 
but not both. Also there are no closed 
paths (hence “acyclic”): 


DAGs are used in the definition of 
directed graphical models. [Bis06:8.1] 


directed graph: A graph in which 


the arcs go in only one direction, 
in contrast to an undirected graph. 
OnTopOf(A,B), meaning A is on top of 
B, is a property that could be used in 
a directed graph. On the other hand, 
adjacency is a property that could be 
used in an undirected graph, adj(A,B), 
meaning region A is adjacent to region 
B also implies adj(B,A), i.e., region B is 
adjacent to region A. [Wei12:Directed 
Graph] 


directed graphical model (DGM): A 
joint probability distribution over a 
set of random variables. The graphical 


structure is a directed acyclic graph 
(AG), with each node in the graph 
representing a random variable. The 
joint distribution for a set of variables 
X is given by p® = [I; poa|Pa), 
where Pa; denotes the parents of node 
i in the graph. Missing edges in the 
graph imply conditional independence 
relationships. For example, in the fig- 
ure, the factorization is p(x, ...,%7) = 
POD PC) POD) PO%4| , X6) POX) 


x pX) POWs |X 2, X4, X7): 


Sometimes the directed arrows can 
be interpreted as representing causal 
relationships (see probabilistic causal 
model), although there is nothing 
inherently causal about DGMs. The 
DGM is sometimes referred to as a 
Bayesian network or belief network. 
[Mur12:6.2] 


directed illumination: Light energy 


that comes from a particular direc- 
tion hence causing relatively sharp 
shadows. The opposite is diffuse 
illumination. [RBV06] 


direction: A vector that describes the rel- 


ative position of one point with respect 
to another. 


directional derivative: A derivative 


taken in a specific direction, e.g., 
the component of the gradient along 
one coordinate axis. In the figure, 
the images on the right are the verti- 
cal and horizontal directional deriva- 


tives of the image on the left: [WP: 
Directional_derivative] 


Dirichlet distribution: Let x denote a 
d-dimensional probability vector, i.e., a 
vector with all elements non-negative 
and summing to 1. The Dirichlet distri- 
bution is a probability distribution over 
probability vectors given by 

d 
2G) = T +--+ aq) je, 
ræ) stä Taq) ne 
where & = (Qj,...,@q) iS a param- 
eter vector with positive entries. 
[BisOG6:2.2.1] 


Dirichlet prior: In Bayesian statistics, 
the Dirichlet distribution is a conjugate 
prior distribution to the multinomial 
distribution. [BisO6:2.2.1] 


Dirichlet process mixture model: A 
non-parametric method based on a 
mixture model with a potentially infi- 
nite number of components. A Dirich- 
let process is a generalization of the 
Dirichlet distribution. [Mur12:23.2] 


discontinuity detection: See edge 
detection. 


discontinuity preserving regulariza- 
tion: A method of preserving edges 
(discontinuities) from being blurred as 
a result of some regularization opera- 
tion (such as the recovery of a dense 
disparity map from a sparse set of dis- 
parities computed at matching feature 
points). [SSD94] 


discontinuous event tracking: Track- 
ing of events (such as a moving person) 
through a sequence of images. The dis- 
continuous nature of the tracking is 
caused by the distance that a person (or 
a hand, arm etc.) can travel between 
frames and also by the possibility of 
occlusion (or self-occlusion): [LCST06] 


1 
discrete cosine transform (DCT): A 


transformation that converts digital 
images into the frequency domain in 
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terms of the coefficients of discrete 
cosine functions. Used within JPEG 
image compression. [Umb98:2.5.2] 


discrete curve evolution: A method 
of automatic stepwise simplification of 
polygonal curves which can neglect 
minor distortions while preserving 
the perceptual appearance. See curve 
evolution. 


discrete Fourier transform (DFT): A 
version of the Fourier transform for 
sampled data. [Umb98:2.5.1] 


discrete random variable: A random 
variable X is discrete if it only takes 
on values in some countable subset 
{x,, X2,...} of R. Contrast with a 
continuous random variable. Examples 
of discrete random variables include 
the Bernoulli distribution and the 
Poisson distribution. [GS92:2.3] 


discrete relaxation: A technique for 
labeling objects in which the possible 
type of each object is iteratively con- 
strained based on relationships with 
other objects in the scene. The aim is 
to obtain a globally consistent interpre- 
tation (if possible) from locally consis- 
tent relationships. [WH97] 


discrimination function: A binary 
function separating data into two 
classes. See classifier. [DH73:2.5.1] 


discrimination-generating Hough 
transform: A hierarchical algorithm 
to discriminate among objects and to 
detect object rotation and translation 
using projections and slices through 
a constructed Hough space. See 
generalized Hough transform. 


discriminative graphical model: A 
probabilistic modeling framework for 
specifying a collection of statistical, 
relational models. May be applied to 
image data to derive probabilistic infer- 
ences. Also see discriminative model 
and graphical model. [ZBR+01] 


discriminative model: A machine 
learning representation that models 
the dependence of an unobserved vari- 
able on an observed one. This is in 
contrast to a generative model which 
can be used to predict the unob- 
served using only knowledge about 


the observed provided the joint dis- 
tribution relationship between them is 
known. [Bar12:23.3] 


discriminative random field: A 
framework for the classification of 
image regions which incorporates 
neighborhood interactions in the 
labels as well as the observed data. 
[KH03] 


disjoint view: When images are cap- 
tured from different camera (or sen- 
sor) viewpoints with non-overlapping 
fields of view. [AC09] 


disparity: The image distance shifted 
between corresponding points in 
stereo image pairs: [JKS95:11.1] 


Left image features Right image features Disparity 
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disparity gradient: The gradient of a 
disparity map for a stereo pair, that 
estimates the surface slope at each 
image point. See also binocular stereo. 
[Fau93:6.2.5] 


disparity gradient limit: The maximum 
allowed disparity gradient in a poten- 
tial stereo feature match. [PMF85] 


disparity limit: The maximum allowed 
disparity in a potential stereo feature 
match. The notion of a disparity limit 
is supported by evidence from the 
human visual system. [San88] 


dispersion: Scattering of light by the 
medium through which it is traveling. 
[WP:Dispersion_(optics)] 


displacement vector: The shortest path 
between the endpoints of a motion, 
irrespective of the path taken. See also 
translation: [Ros09] 


displacement vector 


dissimilarity metric: A measure of the 
degree to which two objects or struc- 
tures are different. Often framed as a 
Euclidean distance between two fea- 
ture values. [GL86] 


distance function: See distance metric. 
distance map: See range image. 


distance metric: A measure of how 
far apart two things are in terms 
of physical distance or similar- 
ity. A metric can be other func- 
tions besides the standard Euclidean 
distance, such as the algebraic distance 
or Mahalanobis distance. A true metric 
must satisfy: 


e dx, V) +d, D > dx, D 
e d(x, y) = dy, X) 
e dx, X) =0 


d(x, y) = 0 implies x = y 
but computer vision processes often 
use functions that do not satisfy all of 
these criteria. [JKS95:2.5.8] 


distance transform: An image- 
processing operation normally applied 
to binary images in which every object 
point is transformed into a value 
representing the distance from the 
point to the nearest object boundary. 
This operation is also referred to as 
chamfering (see chamfer matching): 
[JKS95:2.5.9] 
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distortion coefficient: A coefficient in 
a given image distortion model, e.g., 
kı, k) in the distortion polynomial. 
See also pincushion distortion, barrel 
distortion. [NSNI92] 


distortion polynomial: A polynomial 
model of radial lens distortion. A 
common example is x = x77. + kir? + 
kər), Y = Ya + kır? + Ror’). Here, x 
and y are the undistorted image coor- 
dinates, x, and y, are the distorted 
image coordinates, r° = xj + y7, and 
kı and k, are the distortion coefficients. 
Usually k is significantly smaller than 
ki; it can be set to O in cases 
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where high accuracy is not required. 
[NSNI92] 


distortion suppression: Correction of 
image distortions (such as nonlin- 
earities introduced by a lens). See 
geometric distortion and geometric 
transformation. [DV95] 


distributed behavior: Agent-based 
approach to e.g., optimization where 
there is a direct analogy between 
points visited in the search space and 
independent behaviorally motivated 
agents. [WB0O2b] 


distributed camera network: A collec- 
tion of loosely coupled cameras and 
processing nodes, spread over a wide 
geographical area with no centralized 
processor, often with limited ability to 
communicate. See also disjoint views. 
[ACO9] 


distribution overlap: The ambiguous 
region of classification between two 
datasets where points may reasonably 
belong to either distribution: [VFJZ01] 


ne 
Distribution overlap 


dithering: A technique simulating the 
appearance of different shades or col- 
ors by varying the pattern of black and 
white (or different color) dots. This 
is a common task for inkjet printers: 
[Low91:4.3.5] 
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divide and conquer: A technique for 
solving problems efficiently by sub- 
dividing the problem into smaller 
subproblems, and then recursively 
solving these subproblems in the 
expectation that the smaller problems 
will be easier to solve. An exam- 
ple is an algorithm for deriving a 
polygonal approximation of a contour 
in which a straight line estimate is 
recursively split in the middle Ginto two 
segments with the midpoint exactly 
on the contour) until the distance 
between the polygonal representation 
and the actual contour is below some 
tolerance: [WP:Divide_and_conquer_ 
algorithm] 
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divisive clustering: Clustering or clus- 
ter analysis in which all items are ini- 
tially considered as a single set (a clus- 
ter) and are divided into component 
subsets (clusters). [DMK03] 


DIVX: An MPEG-4 video compression 
technology aiming to achieve suffi- 
ciently high compression to enable 
transfer of digital video contents over 
the Internet, while maintaining high 
visual quality. [WP:DivX] 


document analysis: A general term 
describing operations that attempt to 
derive information from images of 
documents (including e.g., character 
recognition and document mosaicing). 
[DLL03] 


document mosaicing: Image mosaicing 
of documents. [ZGT99] 


document retrieval: Identification of a 
document in a database of scanned 
documents based on some criteria. 
[WP:Document_retrieval] 


DoG: See 
operator. 


difference-of-Gaussians 


dominant color descriptor (DCD): 
A basic color descriptor used in the 


MPEG-7 coding standard widely used 
for image retrieval. It is most suitable 
for representing local features where a 
small number of colors are enough to 
characterize the color information in 
the region of interest. [Mar03:3.2.2.3] 


dominant motion direction: 1) In 
sensor motion compensation, the 
motion parameters which describe the 
greatest change in the image geometry 
(see also image stabilization). 
2) The most probable direction of 
travel of a large number of freely- 
moving objects, such as a crowd. 
[SAG95] 


dominant plane: A degenerate case 
encountered in uncalibrated structure 
and motion recovery where most or all 
of the tracked image features are copla- 
nar in the scene. [OI06] 


Doppler: A physics phenomenon 
whereby an instrument receiving 
acoustic or electromagnetic waves 
from a source in relative motion 
measures an increasing frequency 
if the source is approaching and a 
decreasing frequency if it is reced- 
ing. The acoustic Doppler effect is 
employed in sonar sensors to estimate 
target velocity as well as position. 
[WP:Doppler_effect] 


downsampling: A reduction in the sam- 
pling rate of a signal or image, usually 
to reduce the amount of data trans- 
ported. Also known as subsampling. In 
images, this often means using fewer 
pixels to represent the same underly- 
ing data: [SB11:Ch. 2] 


original 


downsampled equivalent 


downhill simplex: A method for find- 
ing a local minimum using a sim- 
plex (a geometrical figure specified by 
N + 1 vertices) to bound the optimal 
position in an N-dimensional space. 
See also optimization. [WP:Nelder- 
Mead_method] 


DSP: See digital signal processor. [WP: 
Digital_signal_processing] 


dual of the image of the absolute 
conic (DIAC): If œ is the matrix rep- 
resenting the image of the absolute 
conic, then w`! represents its dual 
(DIAC). Calibration constraints are 
sometimes more readily expressed in 
terms of the DIAC than the IAC. 
[HZ00:7.5] 


dual quaternion: A dual quaternion 
is an ordered pair of quaternions 
which can be used to represent spatial 
rigid body displacements in 3D. [WP: 
Dual_quaternion] 


dual-tree wavelet: Also dual-tree com- 
plex wavelet transform (DTCWT. 
An almost  shift-invariant wavelet 
transform that still allows for perfect 
signal reconstruction. It calculates the 
dual wavelet transform (DWT) in two 
decompositions (known as tree-a and 
tree-b), one of which may be real and 
one imaginary. [SBK05] 


duality: The property of two concepts or 
theories having similar properties that 
can be applied to the one or to the 
other. For instance, several relations 
linking points in a projective space are 
formally the same as those linking lines 
in a projective space; such relations are 
dual. [Fau93:2.4.1] 


duplicate image retrieval: The process 
of detecting images with the same, or 
approximately the same content, irre- 
spective of geometry, compression or 
image encoding used in storage. The 
degree of similarity may be computed 
on pixels or features or a combination 
of both. This is a specific case of image 
retrieval. [KSHO5] 


dust filtering: The process of apply- 
ing operators for the detection or 
removal of primarily salt-and-pepper 
noise. The defining noise charac- 
teristic of dust is that the inten- 
sity value of noisy pixels bears no 
relation to the surrounding pixel 
neighborhood. In the figure, (a) the 
original image has a dust artifact which 
(b) shows up clearly when analyzing 
entropy in the high frequency DCT 
components: 
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dynamic appearance model: A model 
describing the changing appearance of 
an object or scene over time. [KLO9] 


dynamic Bayesian network (DBN): A 
directed graphical model or Bayesian 
network to represent a process evolv- 
ing in time. A DBN is typically speci- 
fied by an initial state distribution and 
a transition model, specifying the evo- 
lution of the system from the present 
time into the future, e.g., from time 
step ¢ to f+ 1 for a discrete-time sys- 
tem. A hidden Markov model (HMM) is 
a simple example of a DBN. [KF09:6.2] 


dynamic correlation: A Monte Carlo 
stochastic modeling and identification 
framework, often used in financial 
modeling to identify trends. [MB73] 


dynamic event: An event that is 
extended over time, rather than being 
instantaneous. [VSK05] 


dynamic occlusion: In object tracking, 
the phenomenon of objects becom- 
ing partially or fully hidden from view 
(i.e., occlusion) either by moving out 
of frame or by moving behind other 
objects, causing them to appear to 
split or merge within the scene. In 
a distributed camera network, this 
problem can be mitigated by polling 
images from many views simultane- 
ously. [TMB85] 


dynamic programming: An approach 
to numerical optimization in which 
an optimal solution is searched for 
by keeping several competing partial 
paths throughout and pruning alterna- 
tive paths that reach the same point 
with a suboptimal value. [Nal93:7.2.2] 


dynamic range: The ratio of the bright- 
est and darkest values in an image. Most 
digital images have a dynamic range of 
around 100:1 but humans can perceive 
detail in dark regions when the range 
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is even 10,000:1. To allow for this we 
can create high-dynamic-range images: 
[SQ04:4.2.1] 


dynamic scene: A scene in which some 
objects move, in contrast to the com- 
mon assumption in shape from motion 
that the scene is rigid and only the cam- 
era is moving. [BF93] 


dynamic stereo: Stereo vision for a 
moving observer. This allows shape 
from motion techniques to be used 
in addition to stereo techniques. 
[GT95] 


dynamic texture: Moving sequences 
of images whose temporal character- 
istics exhibit certain statistically sim- 
ilar properties or stationarity of a 
recognizable texture pattern. Exam- 
ple sequences include fire, waves, 
smoke and moving foliage. Simulation 
of dynamic textures is often done by 
approximating a model of maximum 
likelihood and using it to extrapolate. 
[Sze10:13.5.1] 


dynamic time warping: A technique 
for matching a sequence of observa- 
tions (usually one per time sample) to 
a model sequence of features, where 
the hope is for a one-to-one match of 
observations to features. Because of 
variations in the rate at which obser- 
vations are produced, some features 
may get skipped or others may match 
to more than one observation. The 
usual goal is to minimize the amount of 
skipping or multiple samples matched 
(time warping). Efficient algorithms to 
solve this problem are based on the lin- 
ear ordering of the sequences. See also 
hidden Markov model (HMM). [WP: 
Dynamic_time_warping] 


dynamic topic model: In large docu- 
ment (or article) collections, a fam- 
ily of probabilistic time series models 
that help to analyze the time evolu- 
tion of different topics. Also used as an 
adaptive behavior model in behavior 
analysis. [GX11:Ch. 9] 
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early vision: A general term refer- 
ring to the initial stages of computer 
vision (i.e., image capture and image 
processing). Also known as low-level 
vision. [Hor86:1.4] 


earth mover’s distance: A metric for 
comparing two distributions by evalu- 
ating the minimum cost of transform- 
ing one distribution into the other 
(e.g., can be applied to color histogram 
matching): [FP03:25.2.2] 


Distribution 1 Distribution 2 Transformation 
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eccentricity: A shape representation 


that measures how non-circular a 
shape is. One way of computing this 
is to take the ratio of the maximum 
chord length of the shape to the max- 
imum length of any orthogonal chord: 
[WP:Eccentricity_C(mathematics)] 


eccentricity transform: Used to com- 
pute multi-scale invariant feature 
descriptors for shapes. Descriptors are 
defined as histograms of the eccentric- 
ity transform of a scale-space repre- 
sentation of the shape. Originally from 


the domain of graph-theory, it is com- 
puted per-pixel as a geodesic inner 
distance between the pixel and some 
other point of importance. [I[AP+08] 


echocardiography: Cardiac ultrasonic 
imaging (ultrasonography), a non- 
invasive technique for imaging the 
heart and surrounding structures. Gen- 
erally used to evaluate cardiac cham- 
ber size, wall thickness, wall motion, 
valve configuration and motion, and 
the proximal great vessels. [WP: 
Echocardiography] 


edge: A sharp variation of the intensity 
function, represented by its position, 
the magnitude of the intensity gradi- 
ent and the direction of the maximum 
intensity variation. [FP03:Ch. 8] 


edge-based segmentation: 
Segmentation of an image based 
on edge detection. [BSOOb] 


edge-based stereo: A type of 
feature-based stereo where the 
features used are edges. [Nal93:7.2.2] 


edge detection: An image-processing 
operation that computes edge vectors 
(gradient and orientation) for every 
point in an image. The first stage of 
edge-based segmentation: 


Examples include the Canny edge 
detector and the Sobel edge detector. 


[FP03:8.3] 
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edge direction: The direction perpen- 
dicular to the normal to an edge, that 
is, the direction along the edge, par- 
allel to the lines of constant intensity. 
Alternatively, the normal direction to 
the edge, i.e., the direction of maxi- 
mum intensity change (gradient). See 
also edge detection and edge point. 
[TV98:4.2.2] 


edge enhancement: An image 
enhancement operation that makes 
the gradient of edges steeper. This 
can be achieved, e.g., by adding some 
multiple of a Laplacian convolved ver- 
sion of the image L(Z, j) to the image 
aap. f&N=8GN+ALGP 


where f(i, j) is the enhanced image 
and A is some constant: [Sch89:Ch. 4] 


edge finding: See edge detection. 
edge following: See edge tracking. 


edge gradient image: See edge image. 
[WP:Image_gradient] 


edge grouping: See edge tracking. 


edge histogram descriptor (EHD): 
Part of the compact set of MPEG-7 
descriptors designed to capture local- 
ized image texture via a histogram of 
edge orientation. The image is divided 
into 4 x 4 sub-regions each generat- 
ing a five-bin histogram of that region 
(recording vertical, horizontal, diago- 
nal (45° and 135°) and non-directional 
edges) which are concatenated to form 
an 80-bin descriptor for the entire 
image (i.e., 4 x 4 x 5). [ISO02] 


edge image: An image where every 
pixel represents an edge or the edge 
magnitude. [Bor88] 


edge linking: See edge tracking. 


edge magnitude: A measure of the con- 
trast at an edge, typically the magni- 
tude of the intensity gradient at the 
edge point. See also edge detection and 
edge point. [JKS95:5.1] 


edge matching: See curve matching. 


edge motion: The motion of edges 
through a sequence of images. See also 
shape from motion and the aperture 
problem. [JKS95:14.2.1] 


edge orientation: See edge direction. 


edge point: 1) A location in an image 
where some quantity (e.g., intensity) 
changes rapidly. 
2) A location where the gradient is 
greater than some threshold. [FP03:Ch. 
8] 

edge-preserving smoothing: A 
smoothing filter that is designed to 
preserve the edges in the image while 


reducing image noise. For example see 
median filter: 
smoothing] 


[WP:Edge-preserving __ 


E E 
er] 
edge sharpening: See edge 
enhancement. 


edge tracking: 1) The grouping of 
edges into chains of significant edges. 
The second stage of edge-based 
segmentation. Also known as edge 
following, edge grouping and edge 
linking. 
2) Tracking how the edge moves in a 
video sequence. [Dav90:Ch. 4] 


edge type labeling: Classification of 
edge points or edges into a limited 
number of types (e.g., fold edge, 
shadow edge or occluding edge). 
[Dav90:6.11] 


edgel: Small pixel regions or sets of pix- 
els that exhibit edge-like characteris- 
tics: 


T. 
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Can also refer to the pixel which is clos- 
est to that edge. More recently used as 
an abbreviation of “edge pixel”. [NP87] 


EGI: See extended Gaussian image. 


egomotion: The motion of the observer 
with respect to the observed scene. 
[FP03:17.5.1] 


egomotion estimation: Determination 
of the motion of a camera. Generally 
based on image features corresponding 
to static objects in the scene. See also 
structure and motion recovery. The fig- 
ure shows a typical image pair where 
the camera position is to be estimated: 
[WP:Egomotion] 


Image from Position A Image from Position B 
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eigen-decomposition: Let A be a square 
d x d matrix with eigenvalues à; and 
eigenvectors x; for i=1,...,d. We 
assume that the eigenvectors are lin- 
early independent. Subsequently A can 
be expressed as A = XAX™!, where 
X is the d x d matrix whose ith col- 
umn is X, and A is a d x d diagonal 
matrix whose diagonal entries are the 
corresponding eigenvalues, i.e., A; = 
àz. Note that for symmetric matrices 
we have A = XAX', as the eigenvec- 
tors can be chosen such that they are 
orthogonal to each other and have 
norm one, so that X is an orthonor- 
mal matrix. The eigen-decomposition 
is a special case of singular value 
decomposition (SVD). [GL89:Chs 7, 8] 


eigenface: Let X be a matrix whose 
columns are images of faces. An eigen- 
face is an eigenvector obtained from 
the sample covariance matrix of X. 
These vectors can be used for face 
recognition. [FP03:p. 510] 


eigenspace-based recognition: Object 
recognition based on an eigenspace 
representation. [TV98:10.4] 
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eigenspace representation: The 
approximation of an image region by 
a linear combination of basis vectors 
that can be thought of as matching 
between the eigenspace and the 
image. See also principal component 
analysis. [TV98:10.4.2] 


eigentracking: An approach to track- 
ing rigid and articulated objects that 
uses a view-based representation. The 
approach builds on an eigenspace 
representation, robust estimation tech- 
niques and parameterized optical flow 
estimation. [BJ98] 


eigenvalue: A scalar à that for a matrix A 
satisfies Ax = Ax where x is a nonzero 
vector (an eigenvector). [SQ04:2.2.3] 


eigenvector: A nonzero vector x that for 
a matrix A satisfies Ax = Ax where À is 
a scalar (the eigenvalue). [SQ04:2.2.3] 


eigenvector projection: Projection 
onto the principal component analysis 
(PCA) basis vectors. [SQ04:13.1.4] 


elastic matching: Optimization tech- 
nique that minimizes the warp- 
deformation cost between locally vari- 
able data and the corresponding fea- 
tures or pixels in a given model. 
Often used in model-based object 
recognition. [WP:Elastic_matching] 


elastic registration: A higher order 
version of elastic matching where 
corresponding surfaces or points are 
matched or merged by minimizing 
some warp-deformation cost. [SD02] 


electromagnetic spectrum: The entire 
range of frequencies of electromag- 
netic waves including X-rays, ultravi- 
olet, visible light, infrared, microwave 
and radio waves: [Hec87:3.6] 
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ellipse detection: Algorithms for finding 
either geometrically skewed circles or 
actual ellipses. Ellipse detection is one 
of the key problems in image process- 
ing as it can be particularly useful in 


retrieving the parameters of projective 
geometry: 


Ellipse detected in pupil tracker 


There are five major parameters: 
(Xo, Yo) for the center, a for the ori- 
entation, (a,b) for the major and 
minor axes. Many approaches (e.g., the 
Hough transform or RANSAC) use the 
edge pixels in an image as constraints 
in searching this space. [HF98] 


ellipse fitting: Fitting of an ellipse model 
to the boundary of some shape or data 
points: [TV98:5.3] 


ellipsoid: A 3D volume in which all 
plane cross sections are ellipses or cir- 
cles. An ellipsoid is the set of points 
(x, y, D satisfying a + 5 + a =1.In 
computer vision, an ellipsoid is basic 
shape primitive that can be com- 
bined with other primitives in order to 
describe a complex shape. [SQ04:9.9] 


elliptic snake: An active contour model 
of an ellipse whose parameters are esti- 
mated through energy minimization 
from an initial position. [SKB+01] 


elongatedness: A shape representation 
that measures how long a shape is with 


respect to its width (i.e., the ratio of 
the length of the bounding box to its 
width): 


See also eccentricity. [WP:Elongated- 
ness] 


EM: See expectation maximization. 


emotional state recognition: Auto- 
mated analysis and classification of the 
external signs of mood or mental well- 
being of an individual using a mix of 
machine vision and sensors. [PVJ01] 


empirical color representation: User- 
centered formulations of the return 
spectral frequency of light reflections. 
They fall into two categories: statistical 
representations, based on ignoring the 
intensity component of light and mod- 
eling with chromaticity [ABO1]; basis 
function representations that are gen- 
erated by modeling the data in terms 
of various bases. [JF99] 


empirical evaluation: Evaluation of 
computer vision algorithms in order 
to characterize their performance by 
comparing the results of several algo- 
rithms on standardized test problems. 
Careful evaluation is a difficult research 
problem in its own right. [BP98:Ch. 1] 


empirical mode decomposition: A 
method for analyzing natural sig- 
nals, which are most often nonlinear 
and non-stationary, by decomposition 
without leaving the time domain. It 
can be compared to methods of anal- 
ysis such as 2D Fourier transforms 
and wavelet decomposition although 
the basis function representations are 
empirically derived. The technique is 
defined by an algorithm rather than 
theory. For a signal x£): 

1. Identify all extrema of x(t). 
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2. Interpolate between minima (resp. 
maxima), ending up with some 
envelope emin) (resp. Cmax). 

3. Compute the mean mbD= 
(Cmin@® + Cmax) /2. 

4. Extract the detail x + 1) = x) — 
mt). 

5. Iterate on the residual x(t + 1). 

[HSL+98] 


encoding: Converting a digital signal, 
represented as a set of values, from one 
form to another, often to compress the 
signal. In Jossy encoding, information 
is lost in the process and the decoding 
algorithm cannot recover it. See also 
MPEG and JPEG image compression. 
[WP:Code] 


endoscope: An instrument for visually 
examining the interior of various bod- 
ily organs. See also fiberscope. [WP: 
Endoscopy] 


energy minimization: The problem 
of determining the absolute mini- 
mum of a multi-variate function rep- 
resenting (by a potential energy-like 
penalty) the distance of a poten- 
tial solution from the optimal solu- 
tion. It is a specialization of the 
optimization problem. Two popular 
minimization algorithms in computer 
vision are the Levenberg-Marquardt 
optimization and Newton optimization 
methods. [WP:Energy_minimization] 


ensemble learning: The combination 
of multiple learning models to try to 
yield better predictive performance 
than the individual models. Tech- 
niques include bagging, boosting and 
Bayesian statistical model combina- 
tions. [Mur12:14.1] 


entropy: 1) Colloquially, the amount of 
disorder in a system. 
2) A measure of the information con- 
tent of a random variable X. Given that 
X has a set of possible values or out- 
comes X, with probabilities {P (x), x € 
X}, the entropy H(X) of X is defined 
as 


D -PO log PO] 


xeX 
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with the understanding that 0log 0 := 
0. For a multi-variate distribution, the 
joint entropy H(X, Y) of X, Y is 


Š PG, log PŒ, y) 


(x, YEXxY 


For a set of values represented as a 
histogram, the entropy of the set may 
be defined as the entropy of the proba- 
bility distribution function represented 
by the histogram. The figure shows 
—plogp as a function of p (left) - 
probabilities near 0 and 1 signal low 
entropy, probabilities between those 
values are less entropic; the entropy 
of the gray scale histograms in some 
windows on an image (right): [CT91: 
Ch. 2] 
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epipolar constraint: A geometric con- 
straint reducing the dimensionality of 
the stereo correspondence problem. 
For any point in one image, the pos- 
sible matching points in the other 
image are constrained to lie on a line 
known as the epipolar line. This con- 
straint may be described mathemati- 
cally using the fundamental matrix. See 
also epipolar geometry. [FP03:10.1.1] 


epipolar correspondence matching: 
Stereo matching using the epipolar 
constraint. [ZDFL95] 


epipolar geometry: The geometric rela- 
tionship between two perspective 
cameras: [FP03:10.1.1] 
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epipolar line: The intersection of 
the epipolar plane with the image 
plane. See also epipolar constraint. 
[FP03:10.1.1] 


epipolar plane: The plane defined 
by any real-world scene point and 
the optical centers of two cameras. 
[FP03:10.1.1] [BBM87] 


epipolar plane image (EPI): An image 
that shows how a particular line from 
a camera changes as the camera posi- 
tion is changed such that the image line 
remains on the same epipolar plane. 
Each line in the EPI is a copy of the 
relevant line from the camera at a dif- 
ferent time. Features that are distant 
from the camera remain in the same 
position in each line; features that are 
close to the camera move from line to 
line (the closer the feature the further 
it moves): [Low91:17.3.4] 


Image 8 
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epipolar plane image analysis: An 
approach to determining shape from 
motion in which epipolar plane images 
(EPIs) are analyzed. The slope of lines 
in an EPI is proportional to the distance 
of the object from the camera, where 
vertical lines correspond to features at 
infinity. [Low91:17.3.4] 


epipolar plane motion: See epipolar 
plane image analysis. 


epipolar rectification: The image 
rectification of stereo images so that 
the epipolar lines are aligned with the 
image rows (or columns). [AF05] 


epipolar transfer: The transfer of corre- 
sponding epipolar lines in a stereo pair 
of images, defined by a homography. 
See also stereo and stereo vision. 


to pass. See also epipolar geometry: 
[FP03:10.1.1] 


Image Epipolar Lines 
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epipole location: The operation of 
locating the epipoles. [Fau93:6.2.1.2] 


equalization: See 
equalization. 


histogram 


erode operator: The operation of reduc- 
ing a binary or gray scale object 
with respect to the background. This 
has the effect of removing any iso- 
lated object regions and separating any 
object regions that are only connected 
by a thin section: 


Most frequently described as a 
morphological transformation. The 
dual of the dilate operator. [Low91:8.2] 


error propagation: 1) The propagation 
of errors resulting from one computa- 
tion to the next computation. 
2) The estimation of the error (e.g., 
variance) of a process based on the 
estimates of the error in the input data 
and intermediate computations. [WP: 
Propagation_of_uncertainty] 


essential matrix: In binocular stereo, 
a matrix E expressing a bilinear con- 
straint between corresponding image 
points u, w in camera coordinates: 
u Eu= 0. This constraint is the basis 
for several reconstruction algorithms. 
E is a function of the translation and 
rotation of the camera in the world 
reference frame. See also fundamental 
matrix. [FP03:10.1.2] 


[FP03:10.1.4] 


epipole: The point through which all 
epipolar lines from a camera appear 


Euclidean distance: The geometric dis- 
tance between two points (x1, y1) and 


(x2, V2), i.e., Væ =V} + Vr. 
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For n-dimensional vectors x, and xX», 
the distance is ELE — 92,)2D2. 
[$Q04:9.1] 


Euclidean geometry: A system of geom- 
etry based on Euclid’s five axioms. 
Negation of the parallel postulate gives 
rise to non-Euclidean geometries. [WP: 
Euclidean_geometry] 


Euclidean reconstruction: 3D recon- 
struction of a scene using a Euclidean 
frame of reference, as opposed to 
an affine reconstruction or projective 
reconstruction. The most complete 
reconstruction achievable, for exam- 
ple, using stereo vision. [Har94] 


Euclidean space: A representation of 
the space of all tuples (where n 
is the dimensionality). For example 
the three-dimensional Euclidean space 
(X, Y, Z) is typically used to describe 
the real world. Also known as Carte- 
sian space (see Cartesian coordinates). 
[WP:Euclidean_space] 


Euclidean transformation: A trans- 
formation that operates in Euclidean 
space (i.e., maintaining the Euclidean 
spatial arrangements). Examples 
include rotation and translation. Often 
applied to homogeneous coordinates. 
[FP03:2.1.2] [SQ04:7.3] 


Euler angle: A set of angles (œ, £, y) 
describing rotations in three- 
dimensional space. [JKS95:12.2.1] 


Euler-Lagrange equations: The basic 
equations in the calculus of variations, 
a branch of calculus concerned with 
maxima and minima of definite inte- 
grals. They occur, e.g., in Lagrangian 
mechanics and have been used in com- 
puter vision for a variety of optimiza- 
tions, including for surface interpola- 
tion. See also variational approach and 
variational problem. [TV98:9.4.2] 


Euler number: The number of contigu- 
ous parts (regions) less the number 
of holes. Also known as the genus. 
[Jai89:9.10] 


even field: The first of the two fields in 
an interlaced video signal. [Jai89:11.1] 


even function: A function where 
f@ = fC x) for all x. [WP:Even_ 
and_odd_functions#Even_functions] 
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event analysis: See event 
understanding. [WP:Event_study] 


event detection: Analysis of a sequence 
of images to detect activities in the 
scene: [MCB+01] 


Movement detected in 
the image 


Image from a sequence 
of images 


event understanding: Recognition of 
an event (such as a person walking) 
in a sequence of images. Based on the 
data provided by event detection. [WP: 
Event_study] 


exhaustive matching: Matching 
method in which all possibilities are 
considered. As an alternative see 
hypothesize and verify. [FPZ05] 


expectation maximization (EM): An 
iterative method of carrying out 
maximum likelihood estimation for 
models that have missing data, espe- 
cially for latent variable models. In the 
expectation (E) step, the expectation 
values of the latent variables are com- 
puted. In the maximization (M) step, 
the complete data log likelihood is 
maximized with respect to the parame- 
ters of the model. The iteration is guar- 
anteed to converge to a local maxi- 
mum of the log likelihood. EM is the 
standard method for fitting mixture 
models. [Bis06:9.3-9.4] 


expectation value: Consider a random 
variable x and a function f(x). The 
expectation value of f(x) is defined as 
E(f@o) = f f@p@ddx. For fe) = 


x the result is the mean. [Bis06:1.2.2] 


expert system: A system that uses avail- 
able knowledge and heuristics to solve 
problems. See also knowledge-based 
vision. [Low91:11.2] 


exponential smoothing: A method 
for predicting a data value (P41) 
based on the previous observed 
value (D,) and the previous pre- 
diction (P). Pr; = & D; + (A — œ)P, 
where a is a weighting value 


between 0 and 1: [WP:Exponential_ 
smoothing] 


P, (a=1.0) 


Value 


Time 


exponential transformation: See pixel 
exponential operator. 


exposure time: The length of time the 
CCD or CMOS sensor digital camera 
is exposed to light for the capture of 
a single image frame. In conjunction 
with the size of the lens aperture this 
controls the amount of light reaching 
the image-forming sensor. [Sze10:2.3] 


expression understanding: See facial 
expression analysis. 


extended Gaussian image (EGI): Use 
of a Gaussian sphere for histogram- 
ming surface normals. Each surface 
normal is considered from the center 
of the sphere; the value associated with 
the surface patch with which it inter- 
sects is incremented: [FP03:20.3] 


extended light source: A light source 


that has a significant size relative to the 


scene, i.e., is not approximated well by 
a point light source. This type of light 
source has a diameter and hence can 
produce fuzzy shadows: [Hor86:10.5] 


No shadow 


Light Source Fuzzy shadow 


Complete shadow 


exterior orientation: The position of a 
camera in a global coordinate system, 
which is determined by an absolute 
orientation calculation. [FP03:3.4] 


external energy (or force): A measure 
of fit between the image data and an 
active shape model that is part of the 
model’s deformation energy. This mea- 
sure is used to deform the model to the 
image data. [SQ04:8.5.1] 


extremal point: Points that lie on 
the boundary of the smallest con- 
vex region enclosing a set of points 
G.e., that lie on the convex hull). 
[SOS00:4.6.1] 


extrinsic parameters: 
orientation. 


See exterior 


eye location: The task of finding eyes 
in images of faces. Approaches include 
blink detection and face feature 
detection. [MDWW04] 


eye tracking: Tracking the position of 
the eyes in a face image sequence; 
also gaze direction tracking: [WP: 
Eye_tracking] 


AAA 
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f-number: In an optical system, the ratio 
between the focal length of a lens and 
its diameter. [FP03:1.2] 


face analysis: A general term covering 
the analysis of face images and models. 
Often used to refer to facial expression 
analysis. [KCTOO] 


face authentication: Verification that 
(the image of) a face corresponds to a 
particular individual. This differs from 
face recognition in that here only the 
model of a single person is considered: 
[WP:Facial_recognition_system] 


face detection: Identification of faces 
within an image or a series of images. 
This often involves a combination of 
human motion analysis and skin color 
analysis: [WP:Face_detection] 


face feature detection: The location 


of features (such as eyes, nose and 
mouth) from a human face. Nor- 
mally performed after face detection 
although it can be used as part of face 
detection: [WP:Face_detection] ~ 


face identification: See face 
recognition. [WP:Facial_recognition_ 
system] 


face indexing: Indexing from a database 
of known faces as a precursor to face 
recognition. [LWOOb] 


face liveness detection: Methods of 
avoiding the problem of spoofing facial 
recognition biometrics by using a pho- 
tograph, 3D model or video of a face 
rather than a real face. These methods 
include detecting various facial move- 
ments as well as features that are hard 
to fake, such as vasculature. [PWS08] 


face modeling: Representing a face 
using some type of model typi- 
cally derived from an image (or 
images). These models are used in face 
authentication, face recognition etc. 
[HJ01] 


face recognition: The task of recogniz- 
ing a face from an image as an instance 
of a person recorded in a database 
of faces: 
system] 


[WP:Facial_recognition_ 
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face tracking: Tracking of a face in 
a sequence of images. Often used as 
part of a human-computer interface. 
[HC00] 


face verification: See face 
authentication. [WP:Facial_ 
recognition_system] 


facet-model extraction: The extraction 
from range data of a model based 
on facets (small simple surfaces). See 
also planar patch extraction and planar 
facet model. [Har84] 


facial action coding system (FACS): 
An attempt to categorize the entire 
range of facial motion primarily 
through mapping the physical changes 
that accompany emotions. The sys- 
tem was originally developed by 
the psychologists Ekman and Friesen 
[EF78]. It comprises so-called “action 
units” (AUs), which are the fundamen- 
tal actions of individual muscles or 
groups, and action descriptors (ADs) 
which are unitary movements that may 
involve several groups in succession. 


facial animation: Computer graphics 
methods of simulating the human 
head and face. See also facial action 
coding system. [WP:Computer_facial_ 
animation] 


facial expression analysis: Study or 
identification of the facial expres- 
sions (i.e., quantitative measurement 
of movements of the human head and 
face) of a person from an image or 
sequence of images: 


Happy _ Perplexed 


Surprised — 


See also facial action coding system. 
[TKCO5] 


factor analysis: A latent variable model. 
The latent variables Z have indepen- 
dent Gaussian distributions and there 
is a linear relationship between the 
latent and visible variables x, i.e., 


xX = Wz+7, where n is independent 
Gaussian noise. Probabilistic principal 
component analysis is a special case 
of factor analysis where the noise 
variables all have the same variance. 
[Bis06:12.2.4] 


factorization: See motion factorization. 


false alarm: See false positive. 


false negative: A binary classifier c(x) 
returns + or - for examples x. A false 
negative occurs when the classifier 
returns - for an example that is in real- 
ity +. [Mur12:8.3.4] 


false positive: A binary classifier c(x) 
returns + or - for examples x. A 
false positive occurs when the classi- 
fier returns + for an example that is in 
reality -. [Mur12:8.3.4] 


fan-beam reconstruction: An recon- 
struction method for building images 
taken by projecting radiation at differ- 
ent angles from a single point and then 
measuring the resultant return from 
various sensors. A real-world applica- 
tion is X-ray absorption tomography, 
where projections are formed by mea- 
suring the attenuation of radiation that 
passes through a physical specimen at 
different angles: [Par82] 


Equi-spaced 
detectors 


Point 
source 


far light source: A light source far from 
the illuminated object such that the 
rays of light are effectively parallel: 


* 


Far light source 


* 


Near light source 
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See also near light source. 


FAST interest point detector: A spatio- 
temporal interest point detector. A 
corner feature detector which employs 
a machine learning algorithm to 
yield large speed increases by learn- 
ing discriminative properties. Often 
used with non-maximal suppression to 


improve noise resilience. The figure 
shows the detected feature points in 
the image: [RD06] 


fast Fourier transform (FFT): A ver- 
sion of the Fourier transform for dis- 
crete samples that is significantly more 
efficient (order Nlog, NV) than the stan- 
dard discrete Fourier transform (which 
is order N°) on data sets with N points. 
[Low91:13.5] 


fast marching method: A type of 
level set method in which the 
search can move in only one direc- 
tion (hence it being faster). [WP: 
Fast_marching_ method] 


feature: 1) A distinctive part of some- 
thing (e.g., the nose and eyes are dis- 
tinctive features of the face) or an 
attribute derived from an object or 
shape (e.g., circularity). See also image 
feature. 
2) A numerical property (possibly 
combined with others to form a 
feature vector) and generally used in 
a classifier. [TV98:4.1] 


feature-based optical flow estima- 
tion: Calculation of optical flow in 
a sequence of images from image 
features. [HB93] 


feature-based stereo: A solution to 
the stereo correspondence problem in 
which image features from two images 
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are compared. The main alternative 
approach is correlation-based stereo. 
[Gri85] 


feature-based tracking: Tracking the 
motion of image features through a 
sequence. [TV98:8.4.2] 


feature contrast: The difference 
between two features. This can be 
measured in many domains (e.g., inten- 
sity, orientation etc.). [Umb98:2.6.1] 


feature descriptor: See SIFT descriptor 
and SURF. 


feature detection: Identification of 
given features in an image (or model). 
For example, see corner detection. 
[Umb98:2.6] 


feature extraction: See 
detection. 


feature 


feature fusion: Methods to improve the 
robustness of recognition by simulta- 
neously using different feature filters 
while examining the co-occurrence 
probability of the extracted joint fea- 
tures. [SZL+05] 


feature location: See feature detection. 


feature matching: Matching of image 
features in several images of the same 
object (e.g., feature-based stereo) or 
of features from an unknown object 
with features from known objects 
(feature-based object recognition). 
[TV98:8.4.2] 


feature motion: The degree to which 
objects in the image plane move. 


feature orientation: The orientation of 
an image feature with respect to the 
image frame of reference. [RCJ95] 


feature point: The image location at 
which a particular feature is found. 
[KLTO5] 


feature point correspondence: Match- 
ing feature points in two or more 
images. The assumption is that the fea- 
ture points are the image of the same 
scene point. Having the correspon- 
dence allows estimation of the depth 
from binocular stereo, fundamental 
matrix, homography or trifocal tensor 
in the case of 3D scene structure recov- 
ery or of the 3D target motion in the 
case of target tracking. [TV98:8.4.2] 


feature point tracking: Tracking of 
individual image features in a sequence 
of images. [SK05] 


feature representation: The way in 
which parameters of various metrics 
are stored for interest points in 2D 
or 3D images. These feature repre- 
sentations are generally stored as vec- 
tors with one entry per parameter. 
A good example of this is the SIFT 
descriptor, where each entry in the 
feature vector relates to a feature 
which is translation invariant, scaling 
invariant and rotation invariant, or par- 
tially invariant to illumination changes 
and robust to local geometric distor- 
tions. [Sze10:Ch. 4] 


feature selection: Selection of suit- 
able features (properties) for a spe- 
cific task, e.g., classification. Typi- 
cally features should be independent, 
detectable, discriminatory and reliable. 
[FP03:22.3] 


feature similarity: How much two fea- 
tures resemble each other. Measures 
of feature similarity are required for 
feature-based stereo, feature-based 
tracking, feature matching etc. 
(Umb98:2.6.1] 


feature space: The dimensions of a fea- 
ture space are the feature (property) 
values of a given problem. An object 
or shape is mapped to feature space by 
computing the values of the set of fea- 
tures defining the space, typically for 
recognition and classification. In the 
figure, shapes are mapped to a 2D fea- 
ture space defined by area and rectan- 
gularity: [Umb98:2.6.1] 


Area 


Rectangularity 


feature stabilization: A technique for 
stabilizing the position of an image fea- 
ture in an image sequence so that it 
remains in a particular position on a 
display (allowing or causing the rest of 
the image to move relative to that fea- 
ture): [MC96] 


Original sequence 


Minia 


Stabilized Ss en 


Stabilized feature 


feature tracking: See feature-based 
tracking. 


feature vector: A vector formed by 
the values of a number of image 
features (properties), typically all asso- 
ciated with the same object or image. 
(Umb98:2.6.1] 


feedback: The use of outputs from a sys- 
tem to control the system’s actions. 
[WP:Feedback] 


FERET: A standard database of face 
images with a defined experimental 
protocol for the testing and compar- 
ison of face recognition algorithms. 
[WP:FERET_database] 


Feret’s diameter: The distance between 
two parallel lines at the extremities 
of some shape that are tangential to 
the boundary of the shape. Maximum, 
minimum and mean values of Feret’s 
diameter are often used (where every 
possible pair of parallel tangent lines is 
considered): [WP:Feret_diameter] 


Maximum 
Feret’s diameter 
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FFT: See fast Fourier transform. 


fiber optics: A medium for transmitting 
light that consists of very thin glass or 
plastic fibers. It can be used to provide 
much higher bandwidth for signals 
encoded as patterns of light pulses. 
Alternately, it can be used to trans- 
mit images directly through rigidly con- 
nected bundles of fibers, so as to see 
around corners, past obstacles etc. 
[Hec87:5.6] 


fiberscope: A flexible fiber-optic instru- 
ment allowing parts of an object to be 
viewed that would normally be inac- 
cessible. Most often used in medical 
examinations. [WP:Fiberscope] 


fiducial point: A reference point for a 
given algorithm, e.g., a fixed, known, 
easily detectable pattern for a calibra- 
tion algorithm. [WP:Fiducial_marker] 


field of view (FOV): The linear or angu- 
lar limit, 0 that may be imaged in a 
given imaging system because of the 
lens (optics) or sensor in use: 


-7 Line of sight 


Vertical FOV 
Eye point 


The field of view depends on the ratio 
between the sensor width W and the 
focal length, f, of the lens such that 
tan} = 77. (Sze10:2.2] 


figure-ground separation: The 
segmentation of the area of the image 
representing the object of interest 
(the figure) from the remainder of the 
image (the background): [GW91] 


Figure Ground 
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figure of merit: Any scalar that is used 
to characterize the performance of an 
algorithm. [WP:Figure_of_merit] 


filter: In general, any algorithm that 
transforms a signal into another. For 
instance, bandpass filters remove or 
reduce the parts of an input sig- 
nal outside a given frequency inter- 
val; gradient filters allow only image 
gradients to pass through; smoothing 
filters attenuate high frequencies. 
[Dav90:Ch. 3] 


filter ringing: A type of distortion 
caused by the application of a steep 
recursive filter. Normally this term 
applies to electronic filters in which 
certain components (e.g., capacitors 
and inductors) can store and later 
release energy, but there are also dig- 
ital equivalents of this effect. [WP: 
Ringing artifacts] 


filtering: Application of a filter. 


[BB82:3.1] 


filtering threshold: Filtering systems 
generally generate a numeric score 
indicating how well a feature matches 
a given profile. Filtered features with 
scores above the specific filtering 
threshold are then selected. e.g., 
cross-correlation matching. 


fingerprint database indexing: 
Indexing into a database of finger- 
prints using a number of features 
derived from the fingerprints. This 
allows a smaller number of fingerprints 
to be considered when attempting 
fingerprint identification. [RKCJ96] 


fingerprint identification: Identifica- 
tion of an individual through com- 
parison of an unknown fingerprint 
(or fingerprints) with known finger- 
prints. [WP:Automated_fingerprint_ 
identification] 


fingerprint indexing: See fingerprint 
database indexing. 


fingerprint minutiae: In fingerprint 
identification, the major features of a 
fingerprint, such as a loop or whorl. 
[MMJP09] 


finite element model: A class of 
numerical methods for solving dif- 
ferential problems. Another relevant 


class is finite difference methods. [WP: 
Finite_element_method] 


finite impulse response filter (FIR): 
A filter that produces an output value 
Wn) based on the current and past 
input values Œ). Yn = XLo liXni 
where a; are weights. See also infinite 
impulse response filters. [Jai89:2.3] 


FIR: See finite impulse response filter. 


Firewire (IEEE 1394): A serial digital 
bus system supporting 400 Mbits per 
second. Power, control and data sig- 
nals are carried in a single cable. The 
bus system makes it possible to address 
up to 64 cameras from a single inter- 
face card and multiple computers can 
acquire images from the same camera 
simultaneously. [WP:IEEE_1394] 


first derivative filter: See gradient filter. 


first fundamental form: See surface 
curvature. 


Fisher linear discriminant (FLD): A 
classification method that maps high- 
dimensional K class data into K — 1 
dimensions in such a way as to maxi- 
mize class separability. [DH73:4.10] 


Fisher—Rao metric: The Rao distance 
provides a measure of difference 
between two probability distribu- 
tions. Given two probability densities 
p and q which are members of a 
parameterized family with the param- 
eter a, suppose that p(x) = pia), 
q(x) = p(x|a + da) where ôa is a small 
change in a. To a first approxima- 
tion, D(p, q) is a squared distance, 
in that D(pfGl@, pCla+ 6a) = 
lôa. J(a@).a+higher-order terms), 
where J@ is the matrix defining a 
Riemannian metric on the subset from 
which a takes its values. This metric 
is known as the Fisher-Rao metric. 
[May05] 


fisheye lens: See wide angle lens. [WP: 
Fisheye_lens] 


fitness function: The name given in 
the genetic algorithms literature for the 
objective function which is to be opti- 
mized. The fitness function evaluates a 
given member of the genetic algorithm 
population returning a value propor- 
tional to its optimality. [Mit97:9.2] 


fixation: The physiological conse- 
quence of fixing visual attention on a 
single location with the eyes or sen- 
sors pointing fixedly at that location. 
[Dod07] 


flat field: 1) An object of uniform color, 
used for photometric calibration of 
optical systems. 
2) A camera system is flat-field cor- 
rect if the gray scale output at each 
pixel is the same for a given light input. 
[Jai89:4.4] 


flexible template: A model of a shape in 
which the relative position of points is 
not fixed (e.g., defined in probabilistic 
form). This approach allows for varia- 
tions in the appearance of the shape. 
[HTC92] 


FLIR: Forward-looking infrared. An 
infrared system mounted on a vehicle 
looking along the direction of travel: 
[WP:Forward_looking_infrared] 


O Infrared Sensor 


flow field: See optical flow field. 


flow histogram: A histogram of the 
optical flow in an image sequence. This 
can be used, e.g., to provide a qualita- 
tive description of the motion of the 
observer. [DTS06] 


flow vector field: Optical flow is 
described by a vector (magnitude and 
orientation) for each image point. 
Hence a flow vector field is the same 
as an optical flow field. [Fau93:9.2] 


fluorescence: The emission of visible 
light by a substance caused by the 
absorption of some other (possibly 
invisible) electromagnetic wavelength. 
This property is sometimes used in 
industrial machine vision. [FP03:4.2] 


fluorescence microscopy: A technique 
of close examination used to study 
specimens which can themselves be 
made to fluoresce. It is based on 
exploiting the phenomenon that cer- 
tain materials emit energy detectable as 
visible light when irradiated with light 
at a specific wavelength. The sample 
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may either fluoresce itself or be treated 
with chemicals which do. [HP98] 


fluorescent lighting: A light source 
where the central mechanism of pho- 
ton release is electrical excitation of an 
inert gas containing mercury. Electrons 
traveling through the gas collide with 
the mercury atoms releasing photons 
which collide with a phosphor coating 
creating light. Manufacturers can vary 
the color of the light produced by using 
different combinations of phosphors 
in the coatings. [WP:Fluorescent_ 
lamp] 


fMRI: Functional magnetic resonance 
imaging ({MRD is a technique for 
identifying which parts of the brain 
are activated by different types of 
physical stimulation, e.g., visual or 
acoustic stimuli. An MRI scanner 
is set up to register the increased 
blood flow to the activated areas of 
the brain on an fMRI scan. See also 


nuclear magnetic resonance. [WP: 
Functional_magnetic_resonance_ 
imaging] 

FOA: See focus of attention. [WP: 


Attention#Visual_attention] 
FOC: See focus of contraction. 


focal length: 1) The distance between 
the camera lens and the focal plane. 
2) The distance from a lens at which 
an object viewed at infinity would be 
in focus: [FP03:1.2.2] 


AAAAAA 
LIGHT (from infinity) 


Focal Length | 
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focal plane: The plane on which an 
image is focused by a lens system. Gen- 
erally this consists of an array of pho- 
tosensitive elements. See also image 
plane. [Hec87:5.2.3] 


focal point: The point on the optical 
axis of a lens where light rays from an 
object at infinity (also placed on the 
optical axis) converge: [FP03:1.2.2] 
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Focal Point 


Optical Axis 


focal surface: A term most frequently 
used when a concave mirror is used 
to focus an image (e.g., in a reflector 
telescope). The focal surface in this 
case is the surface of the mirror: [WP: 
Focal_surface] 


Focal Surface 
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focus: Arranging for the focal points of 
various image features to converge on 
the focal plane. An image is considered 
to be “in focus” if the main subject of 
interest is in focus: 


In focus Out of focus 
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Note that focus (or lack of focus) 
can be used to derive useful infor- 
mation (e.g., see depth from focus). 
[TV98:2.2.2] 


focus control: The control of the focus 
of a lens system usually by moving the 
lens along the optical axis or by adjust- 
ing the focal length. See also autofocus. 
[CGLL84] 


focus following: A technique for slowly 
changing the focus of a camera as an 
object of interest moves. See also depth 
from focus. [WP:Follow_focus] 


focus-invariant imaging: An imaging 
system that is designed to be invari- 
ant to focus. Such systems have a large 
depth of field. [BCD97] 


focus of attention (FOA): The feature, 
object or area to which the attention 
of a visual system is directed. [WP: 
Attention#Visual_attention] 


focus of contraction (FOC): The point 
of convergence of the optical flow vec- 
tors for a translating camera. The com- 
ponent of the translation along the 
optical axis must be nonzero. Compare 
focus of expansion. [JKS95:14.5.2] 


focus of expansion (FOE): The point 
from which all optical flow vectors 
appear to emanate in a static scene 
where the observer is moving. For 
example, if a camera system is mov- 
ing directly forwards along the optical 
axis then the optical flow vectors all 
emanate from the principal point (usu- 
ally near the center of the image): 
[FP03:10.1.3] 


Two images from a moving observer. Blended Image 
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FOE: See focus of expansion. 


fold edge: A surface orientation discon- 
tinuity. An edge where two locally pla- 
nar surfaces meet. The figure shows a 
fold edge: [WB94] 


FOLD EDGE 


foreground: In computer vision, gen- 
erally used in the context of object 
recognition. The area of the scene or 
image in which the object of inter- 
est lies. See figure-ground separation. 


form factor: The physical size or 
arrangement of an object. This term is 
frequently used with reference to com- 
puter boards. [FP03:5.5.2] 


Förstner operator: A feature detection 
operator used for corner detection 
as well as other edge features. [WP: 
Corner_detection#The_F.C3.B6rstner_ 
corner_detector] 


forward-backward algorithm: A spe- 
cial case of belief propagation used 
to carry out probabilistic inference in 
a hidden Markov model (HMM) by 
passing messages forwards and back- 
wards in the hidden Markov chain. 
[Bis06:13.2.2] 


forward-looking radar: A radar sys- 
tem mounted on a vehicle looking 
along the direction of travel. See also 
side-looking radar. [KK01] 


Fourier—Bessel transform: See Hankel 
transform. [WP:Hankel_transform] 


Fourier domain convolution: Con- 
volution in Fourier space involves 
simply multiplication of the Fourier 
transformed image by the Fourier trans- 
form filter. For very large filters this 
operation is much more efficient than 
using the convolution operator in the 
original domain. [BB82:2.2.4] 


Fourier domain inspection: Identifi- 
cation of defects based on features 
in the Fourier transform of an image. 
[TH99] 


Fourier image processing: Image 
processing in Fourier space (i.e., pro- 
cessing images that have been trans- 
formed using the Fourier transform). 
[Umb98:2.5.4] 


Fourier matched-filter object recog- 
nition: Object recognition in which 
correlation is determined using a 
matched filter that is the conjugate 
of the Fourier transform of the object 
being located. [CDD94] 


Fourier shape descriptor: A boundary 
representation of a shape in terms of 


[JKS95:2.5.1] 


foreshortening: A typical perspective 
effect whereby distant objects appear 
smaller than closer ones. [FP03:4.1.1] 


the coefficients of a Fourier transfor- 
mation. [BB82:8.2.4] 


Fourier slice theorem: A slice at an 
angle 6 of a 2D Fourier transform of an 
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object is equal to a 1D Fourier trans- 
form of a parallel projection of the 
object taken at the same angle. See 
also slice-based reconstruction. [WP: 
Projection-slice_theorem] 


Fourier space: The frequency domain 
space in which an image (or other 
signal) is represented after appli- 
cation of the Fourier transform. 
[WP:Frequency domain] 


Fourier space smoothing: Application 
of a smoothing filter (e.g., to remove 
high-frequency noise) in a Fourier 
transformed image. [Umb98:2.5.4] 


Fourier transform: A transformation 
that allows a signal to be considered 
in the frequency domain as a sum 
of sine and cosine waves or equiva- 
lently as a sum of exponentials. For 
a two-dimensional image, F(u, v) = 
Je JES fœ, pe ey dydy. See 
also fast Fourier transform, discrete 
Fourier transform and inverse Fourier 
transform. [FP03:7.3.1] 


fovea: The high-resolution central region 
of the human retina. The analogous 
region in an artificial sensor emulates 
the retinal arrangement of photorecep- 
tors, e.g., a log-polar sensor. [FP03:1.3] 


foveal image: An image in which the 
sampled pattern is inspired by the 
arrangement of the human fovea, i.e., 
sampling is most dense in the image 
center and gets progressively sparser 
towards the periphery of the image. 
[WP:Foveal_imaging#Example_ 
images] 
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foveation: 1) The process of creating a 
foveal image. 
2) Directing the camera optical 
axis to a given direction. [WP: 
Foveated_imaging] 


fractal image compression: An image 
compression method based on exploit- 
ing self-similarity at different scales. 
[WP:Fractal_compression] 


fractal measure/dimension: A mea- 
sure of the roughness of a shape. Con- 
sider a curve whose length (L and L 2) 
is measured at two scales ($; and S2). If 
the curve is rough, the length grows as 
the scale increases. The fractal dimen- 


‘ : — log(L1—L2) 7 
sion is D = TESS [JKS95:7.4] 


fractal representation: A representa- 
tion based on self-similarity. For exam- 
ple a fractal representation of an image 
could be based on the similarity of 
blocks of pixels. [IKMK95] 


fractal surface: A surface model that 
is defined progressively using fractals 
(i.e., the surface displays self-similarity 
at different scales). [Pen84] 


fractal texture: A texture representation 
based on self-similarity between scales. 
[JKS95:7.4] 


frame: 1) A complete standard televi- 
sion video image consisting of the even 
fields and the odd fields. 
2) A knowledge representation tech- 
nique suitable for recording a related 
set of facts, rules of inference, precon- 
ditions etc. [TV98:8.1] 


frame buffer: A device that stores a 
video frame for access, display and 
processing by a computer. For exam- 
ple, such devices are used to store 
the frame from which a video dis- 
play is refreshed. See also frame store. 
[TV98:2.3.1] 


frame differencing: A technique for 
detecting changes in a scene by sub- 
tracting (usually) consecutive frames 
from a video and also possibly taking 
the absolute value of the resulting dif- 
ference image. Regions of the differ- 
ence image with high amplitude are 
assumed to be in the salient region. 
[MMNO6] 


frame grabber: See frame store. 


frame of reference: A coordinate 
system defined with respect to some 
object, the camera or the real world: 
[WP:Frame_of_reference] 


A Zworld 


frame rate: The number of images taken, 
broadcast or digitally processed by a 
system per second. Frequently quoted 
in units of frames per second (fps) and 
used as a performance metric in rela- 
tion to computational efficiency of a 
given implementation. [SB11:Ch. 1] 


frame store: An electronic device for 
recording a frame from an imaging sys- 
tem. Typically such devices are used as 
the interface between a CCIR camera 
and a computer. [Dav90:2.2] 


freeform surface: A surface that does 
not follow any particular mathematical 
form; e.g., the folds of a piece of fabric: 
[BM02:4.1] 


Freeman code: A type of chain code 
in which a contour is represented by 
coordinates for the first point followed 
by a series of direction codes (typically 
0 through 7). The figure shows Cleft) 


the Freeman codes relative to the cen- 
ter point and (right) an example of the 
codes derived from a chain of points: 


[Jai89:9.6] 
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0, 0, 2, 3, 1, 0, 7, 7, 6, 0, 1, 2, 2, 4 


Frenet frame: A triplet of mutually 
orthogonal unit vectors (the curve 
normal, the curve tangent and the 
curve binormal or curve bitangent) 
describing a point on a curve: 
[BB82:9.3.1] 


Normal 
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frequency domain filter: A filter 
defined by its action in Fourier space. 
See high-pass filter and low-pass filter. 
[Umb98:3.4] 


frequency spectrum: The range 
of (electromagnetic) frequencies. 
[Hec87:7.8] 


Fresnel diffraction: The apparent bend- 
ing and fringed structuring of light 
which appears to occur close to an 
aperture. [Sha06:p. 357)] 


Fresnel equations: The equations that 
govern the changes in light when mov- 
ing across an interface between two 
materials of different refractive indices. 
The particular equations correspond- 
ing to ratios of electric field amplitudes 
are also known as the Fresnel equa- 
tions. [Hec87:pp. 433-499] 


Fresnel lens: A lens created from 
many small prisms of varying index of 
refraction often used in lighthouse illu- 
minators. A Fresnel lens has a much 
lower bulk than a conventional plano- 
convex lens of equivalent power. To 
make the lens, the non-refractive parts 
of a conventional lens (dark in the fig- 
ure) are removed: 
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front lighting: A general term covering 
methods of lighting a scene where the 
lights are on the same side of the object 
as the camera: [WP:Frontlight] As an 
alternative consider back lighting. 


G 
Light = 
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front porch: In analog signals, the pre- 
lude to the horizontal synchronization 
signal. The figure shows it in the con- 
text of the anatomy of a typical analog 
video signal: [Car01] 
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frontal presentation: A planar surface 
parallel to the image plane. [TSK01] 
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frontier point: Point on an object 
surface where the surface normal is 
known. This can be exploited to infer 
the surface reflectance and the light 
distribution in various circumstances. 
Frontier points may also be used to 
recover surface shape in stereo appli- 
cations. [VFCO5] 


full primal sketch: A representation 
described as part of Marr’s theory of 
vision, that is made up of the raw 
primal sketch primitives together with 
grouping information. The sketch con- 
tains described image structures that 
could correspond with scene struc- 
tures (e.g., image regions with scene 
surfaces). [MH80] 


function-based model: An object 
representation based on functionality 
(e.g., the object’s purpose or the way 
in which it moves and interacts with 
other objects) rather than its geomet- 
ric properties. [SB94] 


function-based recognition: Object 
recognition based on object function- 
ality rather than geometric proper- 
ties. See also function-based model. 
[RDR94] 


functional optimization: An analytical 
technique for optimizing (maximizing 
or minimizing) complex functions of 
continuous variables. [Hor86:6.13] 


functional representation: See 
function-based model. [WP:Function_ 
representation] 


fundamental form: A metric useful 
in determining local properties of 
surfaces. See also surface curvature. 
[Fau93:C.3] 


fundamental matrix: A bilinear rela- 
tionship between corresponding 
points (u, W) in binocular stereo 
images. The fundamental matrix, F, 
incorporates the two sets of camera 
parameters K, K^) and the relative 
position @) and orientation (R) of 
the cameras. Matching points ú from 
one image and # from the other 
image satisfy 4” Fi = 0 where S®) is 
the skew symmetric matrix of f and 
F = (K~!)’S@R7'(K’)“!. See also the 
essential matrix. [TV98:7.3.4] 


fusion: Integration of data from multiple 
sources into a single representation. 
[SQ04:18.5] 


fuzzy logic: A form of logic that allows 
a range of possibilities between true 
and false (i.e., a degree of truth). [WP: 
Fuzzy_logic] 


fuzzy morphology: A type of 
mathematical morphology operation 
that is based on fuzzy logic rather than 
the more conventional Boolean logic. 
[KNOO:1.1] 


fuzzy reasoning: See fuzzy logic. 


fuzzy set: A grouping of data where each 
item in the group has an associated 
grade or likelihood of membership in 
the set. [WP:Fuzzy_set] 


fuzzy similarity measure: A non-finite 
analog to a similarity measure when 
evaluating pairs of fuzzy features. The 
most obvious way of calculating the 
similarity is based on the inter-feature 
distances using the fuzzy equivalent of 
the most usually employed finite dis- 
tance measures. Several have been pro- 
posed as follows: If d is the distance 
measure between two fuzzy sets A and 

B ona universe X, then: 

e Koczy: S(A, B) = PETET. 

e Williams and Steele: S(A, B) = 
e *44.8) where a is the steepness 
measure; 

e Santini: S(A, B) = 1 — Bd(A, B) 
where £ = 1, 2, 00. 

JK05] 
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Gabor filter: A filter formed by multiply- 
ing a complex oscillation by an ellipti- 
cal Gaussian distribution (specified by 
two standard deviations and an orienta- 
tion). This creates filters that are local, 
selective for orientation, have differ- 
ent scales and are tuned for intensity 
patterns (e.g., edges, bars and other 
patterns observed to trigger responses 
in the simple cells of the mammalian 
visual cortex) according to the fre- 
quency chosen for the complex oscil- 
lation. The filter can be applied in the 
frequency domain as well as the spatial 
domain. [FP03:9.2.2] 


Gabor transform: A transformation that 
allows a 1D or 2D signal (such as 
an image) to be represented as a 
weighted sum of Gabor functions. 
[NA05:2.7.3] 


Gabor wavelets: A type of wavelet 
formed by a sinusoidal function that 
is restricted by a Gaussian enve- 
lope function. [NA05:2.7.3] [WP: 
Gabor_wavelet#Wavelet_space] 


gait analysis: Analysis of the way in 
which human subjects move. Fre- 
quently used for biometric or medical 
purposes: [WP:Gait_analysis] 


gait classification: 1) Classification of 
different types of human motion (such 
as walking or running). 
2) Biometric identification of people 
based on their gait parameters. [WP: 
Gait#Energy-based_gait_classification] 


Galerkin approximation: A method 
for determining the coefficients of a 
power series solution for a differential 
equation. [Whe73] 


gallery image: Exemplar image used 
to test the image retrieval or object 
recognition abilities of image search 
algorithms. 


gamma: Devices such as cameras and 
displays that convert between analog 
(denoted a) and digital (d) images gen- 
erally have a nonlinear relationship 
between a and d. A common model 
for this nonlinearity is that the sig- 
nals are related by a gamma curve 
of the form a = c x d”, for some con- 
stant c. For CRT displays, common 
values of y are in the range 1.0-2.5. 
[BB82:2.3.1] 


gamma correction: The correction of 
brightness and color ratios so that an 
image has the correct dynamic range 
when displayed on a monitor. [WP: 
Gamma_correction] 


gamma distribution: A probability 
density function of a random variable 
X defined on [0, 00) obeying a r, £) 
distribution is p(x) = a eP, 
where a is a shape parameter and 
B is an inverse scale parameter. The 
chi-squared distribution with v degrees 
of freedom is a special case of the 
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gamma distribution with a = v/2, B = 
1/2. [BisO6:p. 688] 


gamut mapping: Functional mapping of 
the color-space coordinates of a source 
image to color-space coordinates of a 
reproduction in order to compensate 
for differences in the source and out- 
put medium color gamut capability. 
[ISO04] 


gauge coordinates: A coordinate sys- 
tem local to the image surface itself. 
Gauge coordinates provide a conve- 
nient frame of reference for opera- 
tors such as the gradient operator. 
[FRKV92] 


gauging: Measuring or testing. A 
standard requirement of indus- 
trial machine vision systems. [WP: 
Gauge_(instrument)] 


Gauss map: In three dimensions, a map- 
ping of an object’s surface normals 
onto the unit sphere. In higher dimen- 
sions, it is a mapping of hyper-surfaces 
in R” onto the unit sphere §”"! € R”. 
Also known as the extended Gaussian 
image (EGD. [WP:Gauss_map] 


Gaussian convolution: See Gaussian 
smoothing. 


Gaussian curvature: A measure of the 
surface curvature at a point. It is the 
product of the maximum and mini- 
mum of the normal curvatures in all 
directions through the point. See also 
mean curvature. [FP03:19.1.2] 


Gaussian derivative: The combina- 
tion of Gaussian smoothing and a 
gradient filter. This results in a gradient 
filter that is less sensitive to noise: 
[FP03:8.2.1] 


Normal first Gaussian first 
derivative derivative 


Gaussian distribution: A probability 
density function with this distribution: 


= (x — w? 
/ 2102 : 20? i 


Original 


PO) = 


where u is the mean and o is 
the standard deviation. The Gaussian 
is also known as the normal distri- 
bution. See also the multi-variate 
normal distribution. [Bis06:2.3] 


Gaussian—Hermite moment: Let 
f(x,y) denote the intensity function 
of the image defined in ¢. Subsequently 
the (p,q) order Gaussian-Hermite 
moment of the image is: My, = 


ii. S, DÊ, : oH, :o)dxdy 
where Ay(x :o) and Ay :0) are 
the corresponding Gaussian-Hermite 


polynomial functions with scale 
parameter o. [YP11] 


Gaussian mixture model: A mixture 
model where each of the compo- 
nents is a Gaussian distribution. For 
instance, used to represent color his- 
tograms with multiple peaks. See also 
expectation maximization. [Bis06:9.2] 


Gaussian noise: Noise whose distribu- 


tion is Gaussian in nature. Gaussian 
noise is specified by its standard devi- 
ation about a zero mean, and is often 
modeled as a form of additive noise: 
[TV98:3.1.1] 


Gaussian process: A collection of 
random variables (i.e., a stochastic 
process), any number of which have 
a joint Gaussian distribution. A Gaus- 
sian process is completely defined by 
its mean and covariance functions. 
[RW06:2.2] 


Gaussian process classification: A 
nonlinear generalization of the logistic 
regression model for two-class clas- 
sification, with p(C,|x) = of), 
where o denotes the logistic sigmoid 
function o() = (1 +e)! and f@ 
is a Gaussian process prior over func- 
tions. The model can also be general- 
ized to multiclass classification prob- 
lems. [RW06:Ch. 3] 
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Gaussian process latent variable 
model (GP-LVM): A latent variable 
model for dimensionality reduction. 
The nonlinear mapping between the 
low-dimensional latent space and the 
data space is modeled using Gaussian 
process regression. [Mur12:13.5] 


Gaussian process regression: A 
Bayesian approach to regression 
where a Gaussian process (GP) prior is 
placed directly over functions. Assum- 
ing that the observations are corrupted 
by Gaussian noise, the posterior over 
functions is also a Gaussian process. 


Gaussian 


Gaussian Smoothed Images 


Original 
Image 


sigma = 1.0 sigma = 3.0 


| ‘cil. 


smoothing: An 
image-processing operation aimed 
at attenuating image noise computed 
by a convolution operator with a 
mask sampling a Gaussian distribution: 
[TV98:3.2.2] 


With any given test input this gives rise 
to a predicted mean and variance. The 
figure shows a one-dimensional input 
space with function values observed 
at four locations (+): 


Gaussian speckle: Speckle that has a 


-1 Gaussian distribution. [LWW83] 
2 Gaussian sphere: A sampled represen- 
tation of a unit sphere where the 
-3 surface of the sphere is defined by a 
-5 0 5 . 
input, x number of triangular patches (often 


computed by dividing a dodecahe- 
dron). See also extended Gaussian 
image: [Nal93:9.2.5] 


The colored lines show three random 
samples drawn from the posterior GP 
and the grey shading denotes the two 
standard deviation error bars around 
the mean. The mean prediction of a 
GP has the same form as in kernel 
ridge regression, but the latter method 
does not produce a predictive vari- 
ance. [RW06:Ch. 2] 


Gaussian pyramid: A multi-resolution 
representation of an image formed by 
several images, each one a subsampling 
and Gaussian smoothed version of the 
original one at increasing standard 
deviation: [WP:Gaussian_pyramid] 
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gaze control: The ability of a human 
subject or a robot head to control the 
direction of gaze. [Hen03] 


gaze direction estimation: Estimation 
of the direction in which a human 
subject is looking. Used for human- 
computer interaction: [NMRZ00] 


gaze direction tracking: Continuous 
gaze direction estimation (e.g., in a 
video sequence or a live camera feed). 
[MZ00] 


gaze location: 
estimation. 


See gaze direction 


generalized cone: A generalized 
cylinder in which the swept curve 
changes along the axis. [Nal93:9.2.3] 


generalized curve finding: A gen- 
eral term referring to methods that 
locate arbitrary curves. For exam- 
ple, see generalized Hough transform. 
[Dav90:10] 


generalized cylinder: A volumetric 
representation where the volume is 
defined by sweeping a closed curve 
along an axis. The axis does not need 
to be straight and the closed curve 
may vary in shape as it is moved 
along the axis. For example a cylin- 
der may be defined by moving a cir- 
cle along a straight axis, and a cone 
may be defined by moving a circle 
of changing diameter along a straight 
axis: [FP03:24.2.1] 


generalized eigenvalue: See 
generalized eigenvalue problem. 


generalized eigenvalue problem: Let 
A = [đ;k] be an nxn matrix and 


consider the vector equation Ax = 1x 
where A is a number. A non-trivial value 
for à is called an eigenvalue of the 
matrix A and the corresponding solu- 
tions (by substitution of A) are called 
the eigenvectors of A. The set of all 
eigenvalues is called the spectrum of A 
and the largest among them is known 
as the spectral radius of A. The prob- 
lem of finding the eigenvectors and 
eigenvalues of A is known as the gener- 
alized eigenvalue problem. [Kre83:pp. 
345-346] 


generalized Hough transform: A ver- 
sion of the Hough transform capable 
of detecting the presence of arbitrary 
shapes. [Dav90:10] 


generalized mosaic: A simple and effec- 
tive method for extracting additional 
information at each scene point. As 
the camera moves, it senses each 
scene point multiple times. Fusing the 
data from multiple images yields an 
image mosaic that includes additional 
information about the scene, such as 
extended dynamic range, higher spec- 
tral quality, polarization, focus and dis- 
tance sensing. [SNOla] 


generalized order statistics filter: A 
filter in which the values within the 
filter mask are considered in increas- 
ing order and then combined in some 
fashion. The most common such filter 
is the median filter, which selects the 
middle value. [CY92] 


generate and test: See hypothesize and 
verify. [WP:Trial_and_error] 


generative model: A model representa- 
tion that, explicitly or implicitly, mod- 
els the distribution of inputs and out- 
puts such that sampling from them 
facilitates the generation of a syn- 
thetic example in the input space. 
[BisOG6:1.5.4] 


generic viewpoint: A viewpoint such 
that small motions may cause small 
changes in the size or relative positions 
of features, but no features appear 
or disappear. This contrasts with a 
privileged viewpoint. [WP:Neuroes- 
thetics#The_Generic_Viewpoint] 


genetic algorithm: An optimiza- 
tion algorithm seeking solutions by 
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refining iteratively a small set of 
candidates with a process mimicking 
genetic evolution. The suitability 
(fitness) of a set of possible solutions 
(a population) is used to generate a 
new population until conditions are 
satisfied (e.g., the best solution has 
not changed for a given number of 
iterations). [WP:Genetic_algorithm] 


genetic programming: Application of 
genetic algorithms to evolve programs 
that satisfy some evaluation criteria. 
[WP:Genetic_programming] 


genus: In the study of topology, the num- 
ber of “holes” in a surface. In computer 
vision, sometimes used as a discrimi- 
nating feature for simple object recog- 
nition. [WP:Genus] 


Gestalt: German for “shape”. The 
Gestalt school of psychology, led by 
Wertheimer, Kohler and Koffka in the 
first half of the 20th century, had a 
profound influence on perception the- 
ories and, subsequently, on computer 
vision. Its basic tenet was that a percep- 
tual pattern has properties as a whole, 
which cannot be explained in terms 
of its individual components. In other 
words, the whole is more than the sum 
of its parts. This concept was captured 
in some basic laws (proximity, simi- 
larity, closure, “common destiny” or 
good form and saliency) that would 
apply to all mental phenomena, not 
just perception. Much work on low- 
level computer vision, most notably on 
perceptual grouping and perceptual 
organization, has exploited these ideas. 
See also visual illusion. [FP03:14.2] 


geodesic: The shortest line between two 
points (on a mathematically defined 
surface). [Jai89:3.10] 


geodesic active contour: An active 
contour model similar to the snake 
model in that it attempts to minimize 
an energy function between the model 
and the data, but which also incorpo- 
rates a geometrical model: [CKS97] 


Final Contour 


Initial Contour 


geodesic active region: A technique for 
region-based segmentation that builds 
on geodesic active contours by adding 
a force that takes into account infor- 
mation within regions. Typically a 
geodesic active region will be bounded 
by a single geodesic active contour. 
[PD02] 


geodesic distance: The length of the 
shortest path between two points 
along some surface. This is different 
from the Euclidean distance, which 
takes no account of the surface. The 
figure shows the geodesic distance 
between Calgary and London (follow- 
ing the curvature of the earth): [WP: 


Distance_(graph_theory)] 
e K 


o 


geodesic transform: Assigns to each 
point the geodesic distance to 
some feature or class of feature. 
[SHB08:11.5.6] 


geographic information system 
(GIS): A computer system that stores 
and manipulates geographically refer- 
enced data (such as images of portions 
of the earth taken by satellite). [WP: 
Geographic_information_system] 


a ea) 


geometric algebra: A mathematical lan- 
guage whose function is to unify 
diverse mathematical formalisms in 
order to more easily express physi- 
cal ideas. The geometric algebra of 3D 
space, e.g., is a powerful tool for solv- 
ing problems in classical mechanics 
since it has a very clear and compact 
method for encoding rotations. [DL07] 


geometric compression: The compres- 
sion of geometric structures, such as 
polygons. [TR98] 


geometric constraint: A limitation on 
the possible physical arrangement or 
appearance of objects based on geom- 
etry. These types of constraint are 
used extensively in stereo vision (the 


epipolar constraint), motion analy- 
sis (the rigid motion constraint) and 
object recognition (focusing on spe- 
cific classes of objects or relations 
between features). [Fau93:6.2.6] 


geometric correction: In remote 
sensing, an algorithm or technique 
for correction of geometric distortion. 
[Jai89:8.16] 


geometric deformable model: A 
deformable model in which the defor- 
mation of curves is based on the level 
set method and stops at object bound- 
aries. A typical example is a geodesic 
active contour model. [HXP03] 


geometric distance: In curve fitting and 
surface fitting, the shortest distance 
from a given point to a given surface. In 
many fitting problems, the geometric 
distance is expensive to compute but 
yields more accurate solutions. Com- 
pare algebraic distance. [HZ00:3.2.2] 


geometric distortion: Deviations from 
the idealized image formation model 
(e.g., pinhole camera) of an imag- 
ing system. Examples include radial 
lens distortion in standard cameras. 
[B05] 


geometric feature: A general term 
describing a shape characteristic of 
some data that encompasses features 
such as edges, corners, geons etc. 
[WP:Feature_extraction] 


geometric feature learning: Learn- 
ing geometric features from exam- 
ples of the feature. [WP:Geometric_ 
feature_learning] 


geometric feature proximity: A mea- 
sure of the distance between geomet- 
ric features, e.g., the distance between 
data and overlaid model features in 
hypothesize and verify. [HZLM01] 


geometric hashing: A technique for 
matching models in which some 
geometric invariant features are 
mapped into a hash table that is 
used to perform the recognition. 
[BM02:4.5.4] 


geometric heat flow: Evolution-based 
image segmentation approach using a 
heat flow analogy where the agglom- 
eration motion is orthogonal to the 


edge normals. May also be used as a 
replacement for image smoothing and 
image enhancement since it is similar 
to anisotropic diffusion. [Sap06:2.3] 


geometric invariant: A quantity 
describing some geometric configu- 
ration that remains unchanged under 
certain transformations (e.g., cross 
ratio and perspective projection). 
[WP:Geometric_invariant_theory] 


geometric model: A model that 
describes the geometric shape of 
some object or scene. A model can 
be 2D (e.g., a polycurve) or 3D 
(e.g., a surface-based model). [WP: 
Geometric_modeling] 


geometric model matching: Compari- 
son of two geometric models or of a 
model and a set of image data shapes, 


for the purposes of object recognition. 
[RH95] 


geometric morphometrics: Statistical 
tools for analyzing information about 
the relative geometry of organisms, 
specifically the changes in that geome- 
try brought about by various processes. 
[ARS04] 


geometric optics: A general term refer- 
ring to the description of optics from 
a geometrical point of view. Includes 
concepts such as the simple pinhole 
camera model, magnification, lens etc. 
[Hec87:Ch. 3] 


geometric reasoning: Reasoning with 
geometric shapes in order to address 
such tasks as robot motion planning, 
shape similarity, spatial position esti- 
mation etc. [MK90] 


geometric representation: See 
geometric model. [WP:RGB_color_ 
model#Geometric_representation] 


geometric shape: A shape that takes 
a relatively simple geometric form 
(such as a square, ellipse, cube, 
sphere, generalized cylinder etc.) or 
that can be described as a combina- 
tion of such geometric primitives. [WP: 
Geometric_primitive] 


geometric transformation: A class 
of image-processing operations that 
transform the spatial relationships 
in an image. They are used for the 
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correction of geometric distortions 
and general image manipulation. A 
geometric transformation requires 
the definition of a pixel coordinate 
transformation together with an 
interpolation scheme. The figure 
shows a rotation: [Umb98:3.5] 


< 

geon: GEometrical iON. A basic 
volumetric representation primi- 
tive proposed by Biederman and 
used in recognition by components. 


Some example geons are: [WP: 
Geon_(psychology)] 


FAL 


gesture analysis: Basic analysis of video 
data representing human gestures pre- 
ceding the task of gesture recognition. 
[WP:Gesture_recognition] 


gesture-based user interface: Various 
methods of communicating commands 
to electronic devices which are based 
on capturing and analyzing human 
motion, either facial or physical. It is 
critical that the analysis takes place at 
a very rapid rate in order to avoid lag 
between the command being issued 
and acted upon. [SPHK08] 


gesture component space: Once a 
video sequence is split into atomic 
actions, the gestures which took place 
can be represented by a mixture 
model. To determine automatically 
how many typical groups exist, they 
can be represented in the gesture com- 
ponent space, where the axes are the 
principal components of the motion. 
[GX11:Ch. 5] 
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gesture recognition: The recognition of 
human gestures, generally for the pur- 
pose of human-computer interaction. 
See also hand sign recognition: [WP: 


Gesture] 


gesture segmentation: The 
segmentation of a video sequence 
into a stream of atomic actions which 
can be mapped to a user-specified 
dictionary of pre-arranged templates. 
Gesture recognition can be performed 
on the segments to identify what they 
convey. [GX11:Ch. 5] 


gesture tracking: Computing the ges- 
ture trajectory of a human (or ani- 
mal) participant through the gesture 
component space. [GX11:Ch. 5] 


ghost artifacts: Appearance of 
unwanted secondary (often weaker) 
signal artifacts superimposed on the 
main signal. In imaging this can be 
for a wide range of reasons including 
unintended optical, electrical, expo- 
sure or sensing effects. [Hen07:Ch. 
11] 


Gibbs point process: Models of how 
point-like objects (e.g., the particles 
in a gas) arrange themselves into a 
state of equilibrium. The simplest form 
is that of pairwise interaction since 
it is defined by a single parameter. 
[Hei94] 


Gibbs sampling: A method for proba- 
bilistic inference based on transition 
probabilities (between states). [WP: 
Gibbs_sampling] 


GIF: Graphics interchange format. A 
common image compression format 
based on the Lempel-Ziv-Welch algo- 
rithm. [Umb98:1.8] 


GIS: See geographic information system. 
[WP:Geographic_information_system] 


gist image descriptor: The spatial enve- 
lope model of a scene is based on four 
global properties of the image (natural- 
ness, openness, roughness and expan- 
sion). Outputs from a set of image 
filters like Gabor filters (earned from 
image data) are combined to give an 
estimate of each of the four global 
properties. Collectively, the four val- 
ues define a global descriptor for the 
image, suitable for image grouping and 
image indexing. [OTO1] 


glint: A specular reflection visible on a 
mirror-like surface: [WP:Glint] 


Glint 


global point signature: For a fixed 
point x, the global point signature 
G(x) is a vector whose compo- 
nents are scaled eigenfunctions of the 
Laplace-Beltrami operator evaluated at 
x. 


global positioning system (GPS): A 
system of satellites that allow the posi- 
tion of a GPS receiver to be deter- 
mined in absolute, earth-referenced 
coordinates. Accuracy of standard 
civilian GPS is of the order of 
meters. Greater accuracy is obtain- 
able using differential GPS. [WP: 
Global_Positioning_System] 


global property: A property of a mathe- 
matical object that depends on all com- 
ponents of the object. For example, the 
average intensity of an image is a global 
property, as it depends on all the image 
pixels. [WP:Global_variable] 


global structure extraction: Identifica- 
tion of high level structures or rela- 
tionships in an image (e.g., symmetry 
detection). [OIAO1] 


global transform: A general term 
describing an operator that transforms 
an image into some other space. 
Sample global transforms include 
the discrete cosine transform, the 


Fourier transform, the Haar transform, 
the Hadamard transform, the Hartley 
transform, histograms, the Hough 
transform, the Karhunen-Loéve 
transform, the Radon transform and 
the wavelet transform. [CTM+96] 


gold standard: A term used to refer 
either to the ground truth or the result 
obtained using the current state-of-the- 
art (i.e., the best-performing) approach 
for a given task to which we are com- 
paring. 


golden template: An image of an 
unflawed object or scene that is used 
within template matching to identify 
any deviations from the ideal object or 
scene. [XGOO] 


gonioreflectometer: A light source and 
sensor arrangement used for collect- 
ing intensity data from many angles 
in order to generate a model of the 
bidirectional reflectance distribution 
function of a given surface or object. 
[Foo97] 


gradient: Rate of change. Frequently 
associated with edge detection. See 
also gray scale gradient: [Nal93:3.1.2] 


Intensity 
Gradient 


Position in 
image row 


Position in 
image row 


gradient-based flow estimation: Esti- 
mation of the optical flow based on 
gradient images. This computation can 
be done directly through the compu- 
tation of a time derivative as long as 
the movement between frames is quite 
small. See also the aperture problem. 
[KTB87] 


gradient constraint equation: Equa- 
tion relating optical flow velocity in 
the image (u, v) to the image intensity 
function I(x, y, H. The usual assump- 
tion in gradient-based approaches is 
that the intensity of an object point is 
constant over time and changes must 
be due entirely to the motion of the 
camera. Camera motion makes a point 
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(x, y, t) change position in the image 
over a time ôt. Assuming constant 
intensity, the image point is the same 
at time ¢ and time ¢ + ôt thus: 

Ix, y, D = Id + ôx, y + dy, t + ôt), 
thence from its Taylor series: 

Ix, y, D = Id + ôx, y + ôy, t+ 

8D = IŒ, y, D + Fox + F by + Fst 
+ high-order terms. From this follows 
the gradient constraint equation: 


ol ol al ` 


gradient descent: A popular method of 
optimization. The central idea is to 
find a minimum of a function fŒ) 
by repeatedly moving in the direc- 
tion of steepest descent (i.e., the gradi- 
ent direction). If f has a minimum at 
Xy and we begin iterating at a point 
Xo then we search for a minimum 
of f in the direction —V fCx) which 
is the direction of steepest descent. 
For a suitable t; we iteratively com- 
pute X; = x; — qV f(x) until con- 
vergence. [Kre83:pp. 871-872] 


gradient edge detection: Edge 
detection based on image gradients. 
[BB82:3.3.1] 


gradient feature: Semi-invariant fea- 
tures (often edges) computed after 
applying a gradient operator to an 
image. [Liu07] 


gradient filter: A filter that is con- 
volved with an image to create an 
image in which every point repre- 
sents the gradient in the original 
image in an orientation defined by 
the filter. Normally two orthogonal fil- 
ters are used; by combining them, a 
gradient vector can be determined for 
every point. Common filters include 
the Roberts cross gradient operator, 
the Prewitt gradient operator and the 
Sobel gradient operator. The Sobel hor- 
izontal gradient operator gives: [WP: 
Edge_detection] 


Gradient 
Filter 


-1/0]1 
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gradient image: See edge image. [WP: 
Image_gradient#Computer_vision] 
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gradient location and orientation 
histogram (GLOH) features: A 128- 
dimensional robust descriptor which 
is an extension of the SIFT descriptor 
that uses histograms with 17 location 
and 16 orientation bins in a log-polar 
configuration. The dimensionality is 
reduced via post-binning principal 
component analysis. [Sze10:4.1] 


gradient magnitude thresholding: 
Thresholding of a gradient image in 


order to identify “strong” edge points: 
[KR82] 


gradient matching stereo: An 
approach to stereo matching in 
which the image gradients (or features 
derived from the image gradients) are 
matched. [CS09:6.9] 


gradient operator: An image- 
processing operator that produces a 
gradient image from a gray scale input 
image J. Depending on the usage of 
the term, the output could be the 
vectors VJ of the x and y derivatives 
at each point or the magnitudes of 
these gradient vectors. The usual role 
of the gradient operator is to locate 
regions of strong gradients that signal 
the position of an edge. The figure 
shows a gray scale image and its 
intensity gradient magnitude image, 


where darker lines indicate stronger 
magnitudes: 


The gradient was calculated using the 
Sobel gradient operator. [DH73:7.3] 


gradient space: A representation of sur- 
face orientations in which each orien- 
tation is represented by a pair (Pp, q) 
where p = Z and q = = (where the z 
axis is aligned with the optical axis of 
the viewing device): [Hor86:15.3] 


Zz 
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surface orientations Gradient Space 


gradient vector: A vector describing the 
magnitude and direction of maximal 
change on an N-dimensional surface. 
[WP:Gradient] 


gradient vector flow: In active contour 
models, a dense vector field derived by 
minimizing an energy function which 
comes from solving a pair of decoupled 
differential equations that diffuse the 
gradient vectors of an image. [XP98] 


gradient vector flow snake (GVF): An 
extension of active contour models 
where the contours converge to 
boundary concavities and do not need 
to be initialized close to a boundary. 
[XP98] 


graduated non-convexity: An algo- 
rithm for finding a global minimum in 
a function that has many sharp local 
minima (a non-convex function). This 
is achieved by approximating the func- 
tion by a convex function with just 
one minimum (near the global mini- 
mum of the non-convex function) and 
then gradually improving the approxi- 
mation. [Bla83] 


grammar: A system of rules constrain- 
ing the way in which primitives 
(such as words) can be combined. 
Used in computer vision to repre- 
sent objects where the primitives are 
simple shapes, textures or features. 
[DH73:12.2.1] 


grammatical representation: A repre- 
sentation that describes shapes using a 


number of primitives that can be com- 
bined using a particular set of rules (the 
grammar). [RA06] 


granulometric spectrum: The resul- 
tant distribution from a granulometry. 
[SD92] 


granulometry: The study of the size 
characteristics of a set (e.g., the 
size of a set of regions). Most nor- 
mally this is achieved by applying 
a series of morphological openings 
(with structured elements of increas- 
ing size and then studying the resultant 
size distributions. [WP:Granulometry_ 
(morphology)] 


graph: A set of vertices V and a 
set of edges E C V x V linking pairs 
of vertices. Vertices u and v are 
neighbors if (u, v) € E or w, W) € E. 
See graph isomorphism and subgraph 
isomorphism. This is a graph with five 
vertices: [FP03:14.5.1] 


D 
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graph classification problem: A 
classification problem where each 
input object is a graph. [GHCO02] 


graph clustering: A clustering prob- 
lem where each input object is a 
graph. Contrast with graph theoretic 
clustering. [RGSO8] 


graph cut: A partition of the vertices ofa 
directed graph V into two disjoint sets 
S and T. The cost of the cut is the cost 
of all the edges that go from a vertex 
in S to a vertex in T. [CS09:6.11] 


graph embedding: A mapping from a 
graph to a vector space. The extracted 
features could be used to define a 
graph kernel. [GVB11] 


graph isomorphism: Two graphs are 
isomorphic if there exists a mapping 
(bijection) between their vertices that 
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makes the edge sets identical. Deter- 
mining whether two graphs are iso- 
morphic is the “graph isomorphism 
problem”, believed to be NP-complete. 
These small graphs are isomorphic 
with A:b, C:a and B:c: [Fau93:11.2.2] 


U LA 


graph kernel: A kernel function (defini- 
tion 3) that returns a scalar giving the 
similarity of two graphs. [VSKB10] 


graph matching: A general term 
describing techniques for com- 
paring two graph models. These 
techniques may attempt to find 
graph isomorphisms or subgraph 
isomorphisms, or may just try to 
establish similarity between graphs: 


[Fau93:11.2.2] 


Graph | Model Graph | Model 


graph median: In graph theory, the 
vertex for which the sum of lengths 
of shortest paths to all other vertices is 
minimized. [MS79] 


graph model: A model of data in terms 
of a graph. Typical uses in computer 
vision include object representation 
(see graph matching) and edge gra- 
dients (see graph searching). [WP: 
Graphical_model] 


graph partitioning: The operation of 
splitting a graph into subgraphs sat- 
isfying some criteria. For example 
we might want to partition a graph 
of all polygonal edge segments in 
an image into subgraphs correspond- 
ing to objects in the scene. [WP: 
Graph_partition] 
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graph pruning: The simplification of a 
graph by the removal of vertices and 
edges. 


graph representation: See graph 
model. [WP:Graph_(data_structure) 
#Representations] 


graph searching: Looking for a specific 
node or path through a graph. Used for, 
among other things, border detection 
(e.g., in an edge gradient image) and 
object identification (e.g., a decision 
tree). [TSF95] 


graph similarity: The degree to which 
two graph representations are simi- 
lar. Typically Gn computer vision) the 
representations are not exactly the 
same and hence a double subgraph 
isomorphism may need to be found to 
evaluate similarity. [WFKM97] 


graph theoretic clustering: Clustering 
algorithms that use concepts from 
graph theory, in particular lever- 
aging efficient graph-theoretic algo- 
rithms such as maximum flow. [WP: 
Cluster_analysis] 


graphical model: 
graphical model. 


See probabilistic 


grassfire algorithm: A technique for 
finding a region skeleton based on 
wave propagation. A virtual fire is lit 
on all region boundaries and the skele- 
ton is defined by the intersection of the 
wave fronts: [LL92] 


TIME 


VAN 


OBJECT FIRE 


SKELETON 


Grassmannian space: The set of k- 
dimensional linear subspaces in an n- 
dimensional vector space V, denoted 
as Gr(n, k). [CHS95] 

grating: See diffraction grating. [WP: 
Grating] 


gray level . . .: See gray scale... 


gray scale: A monochromatic represen- 
tation of the value of a pixel. Typically 
this represents image brightness and 
ranges from 0 (black) to 255 (white): 
[Gal90:3.4] 
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gray scale co-occurrence: The occur- 
rence of two particular gray levels 
some particular distance and orien- 
tation apart. Used in co-occurrence 
matrices. [Nev82:8.3.1] 


gray scale correlation: The cross 
correlation of gray scale values 
in image windows or full images. 
[WKTO1] 


gray scale distribution model: A 
model of how gray scales are dis- 
tributed in some image region. See also 
intensity histogram. [OPM02] 


gray scale gradient: The rate of change 
of the gray levels in a gray scale image. 
See also edge, gradient image and first 
derivative filter. [SFWKO2] 


gray scale image: A monochrome image 
in which pixels typically represent 
brightness values ranging from 0 to 
255. See also gray scale: [SQ04:4.1.1] 


gray scale mathematical morphol- 
ogy: The application of a mathematical 
morphology operation to gray scale 
images. Each quantization level is 
treated as a distinct set where pixels 
are members of the set if they have a 
value greater than or equal to particu- 
lar quantization levels. [SQ04:7.2] 


gray scale moment: A moment that is 
based on image or region gray scales. 
See also binary moment. [PKK08] 


gray scale morphology: See gray scale 


mathematical morphology. 


gray scale similarity: See gray scale 
correlation. 


gray scale texture moment: A moment 
that describes texture in a gray scale 
image (e.g., the Haralick texture oper- 
ator describes image homogeneity). 
[CD94] 


gray scale transformation: A gen- 
eral term describing a class of 
image-processing operations that 
apply to gray scale images and simply 
manipulate the gray scale of pixels. 
Example operations include contrast 
stretching and histogram equalization. 
[CK94] 


gray value . . .: See gray scale... 


greedy search: A search algorithm seek- 
ing to maximize a local criterion 
instead of a global one. Greedy algo- 
rithms sacrifice generality for speed. 
For instance, the stable configuration 
of a snake is typically found by an iter- 
ative energy minimization. The snake 
configuration at each step of the opti- 
mization can be found globally, by 
searching the space of all allowed con- 
figurations of all pixels simultaneously 
(a large space) or locally (greedy algo- 
rithm), by searching the space of all 
allowed configurations of each pixel 
individually (a much smaller space). 
[NA05:6.3.2] 


grey ...: See gray... 


grid filter: An approach to noise 
reduction where a nonlinear function 
of features (pixels or averages of a num- 
ber of pixels) from the local neigh- 
borhood are used. Grid filters require 
a training phase where noisy data 
and corresponding ideal data are pre- 
sented. [VJ98] 


ground following: See ground tracking. 


ground plane: The horizontal plane that 
corresponds to the ground (the sur- 
face on which objects stand). This con- 
cept is only really useful when the 
ground is roughly flat. The ground 
plane is highlighted in the figure: [WP: 
Ground_plane] 
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ground tracking: A loosely defined 


term describing the robot naviga- 
tion problem of sensing the ground 
plane and following some path. [WP: 
Ground_track] 


ground truth: In performance analysis, 


the true value, or the most accurate 
value achievable, of the output of a spe- 
cific instrument under analysis, e.g., 
a vision system measuring the diame- 
ter of circular holes. Ground truth val- 
ues may be known theoretically, e.g., 
from formulae, or obtained through an 
instrument more accurate than the one 
being evaluated. [TV98:A.1] 


group activity: An activity involving 
multiple individuals within a shared 
common space. [GX11:Ch. 7] 
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group association context: The use 


of contextual information in behavior 
analysis tasks that target the under- 
standing of group activity derived from 
the association of several individuals 
within a group where some knowledge 
of individual behaviors is extracted via 
behavior classification. [GX11:Ch. 2] 


grouping: 1) In human perception, the 


tendency to perceive certain patterns 
or clusters of stimuli as a coherent, dis- 
tinct entity as opposed to a set of inde- 
pendent elements. 

2) A whole class of segmentation algo- 
rithms based on the idea of group- 
ing. Much of this work was inspired 
by the Gestalt school of psychol- 
ogy. See also segmentation, image 
segmentation, supervised classification 
and clustering. [FP03:Ch. 14] 


grouping transform: An image analysis 


technique for grouping image features 
together (e.g., based on collinearity). 
[TV98:5.5] 


H.263: Video compression standard that 
uses a very low-bitrate compressed for- 
mat. As it was originally designed for 
video conferencing it supports sev- 
eral video frame sizes: 128 x 96, 176 x 
144, 352 x 288, 704 x 576 and 1408 x 
1152. [Gal91] 


H.264: Also known as MPEG-4 Part 
10 or AVC (Advanced Video Cod- 
ing). A flexible popular standard 
for high-definition video compression. 
H.264/AVC contains many new fea- 
tures over H.263 and offers a lower 
bitrate and more efficient compres- 
sion and flexibility for application to a 
wide variety of network environments. 
It consists of two main layers: the 
video coding layer (VCL) which inde- 
pendently encodes the video and the 
network abstraction layer (NAL) which 
formats the video and provides header 
information allowing it to be transmit- 
ted. [Ric03] 


Haar transform: A wavelet transform 
that is used in image compression. 
The basis functions are similar to those 
used by first derivative edge detection 
systems, resulting in images that are 
decomposed into horizontal, diagonal 
and vertical edges at different scales. 
[PS06:4.4] 


Hadamard transform: An operation 
that can transform an image to its con- 
stituent Hadamard components. A fast 
version of the algorithms exist, which 
is similar to the fast Fourier transform, 
but all values in the basis functions 
are either +1 or —1. It requires signif- 
icantly less computation and as such 
is often used for image compression. 
[(Umb98:2.5.3] 


Hahn moment: 
image 


Given a 
fœ, of size 


digital 
NXN, 


the (m+ mth order Hahn 
moment of the image is: Hmn = 
Eio Lyao SE, DHPC, MBE 
W, N) where b%”(x, N) is the Hahn 
polynomial of order n. [ZSZ+05] 


halftoning: See dithering. [WP: 


Halftone] 


Hamming distance: The number of dif- 
ferent bits in corresponding positions 
in two bit strings. For instance, the 
Hamming distance of 01110 and 01100 
is 1, that of 10100 and 10001 is 2. A 
very important concept in digital com- 
munications. [CS09:6.3.2] 


hand orientation: The direction of the 
hand. Axiomatically defined as the vec- 
tor from the center point of the wrist to 
the base of the longest (second) finger. 


Hand orientation 


hand-sign recognition: The recogni- 
tion of hand gestures such as those 
used in sign language: [CSW95] 


hand tracking: The tracking of a per- 
son’s hand in a video sequence, often 
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used in human-computer interaction. 
[RK94] 


hand-eye calibration: The calibration 
of a manipulator (such as a robot arm) 
together with a visual system (such as 
a number of cameras). The main issue 
here is ensuring that both systems use 
the same frame of reference. See also 
camera calibration. [HD95] 


hand-eye coordination: The use of 
visual feedback to direct the move- 
ment of a manipulator. See also 
hand-eye calibration. [HCM95] 


handwriting verification: Verification 
that a style of handwriting corresponds 
to that of some particular individual. 
[WP:Handwriting_recognition] 


handwritten character recognition: 
The automatic recognition of charac- 
ters that have been written by hand: 
[WP:Handwriting_recognition] 


Hankel transform: A simplification 
of the Fourier transform for radi- 
ally symmetric functions. [WP:Hankel_ 
transform] 


harmonic homology: A perspective (or 
projective) collineation of period two. 


Harris corner detector: A corner 
detection system where a corner is 
detected if the eigenvalues of the 
matrix M are large and locally maxi- 
mum (/(, j) is the intensity at point 

af of af af 


af af af af 
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To avoid explicit computation of 
the eigenvalues, the local maxima of 
det(M) — 0.004 x trace(M) can be 
used. This is also known as the Plessey 
corner finder. [WP:Harris_affine_ 
region_detector#Harris_corner_ 


measure] 


Gj). M = 


Hartley transform: Similar transform to 
the Fourier transform, but the coef- 
ficients used are real (whereas those 
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used in the Fourier transform are com- 
plex). [Low91:13.4] 


hat transform: See Laplacian of 
Gaussian operator (also known as 
Mexican hat operator) and top hat 
operator. [WP:Top-hat_transform] 


Hausdorff distance: A measure of the 
distance between two sets of (image) 
points. For every point in both sets, 
determine the minimum distance to 
any point in the other set. The Haus- 
dorff distance is the maximum of these 
minimum values. [Fau93:10.3.1] 


HDR: See high dynamic range imaging. 


HDTV: High Definition TeleVision. [WP: 
High-definition_television] 


height image: See range image. [WP: 
Range_imaging] 
Helmholtz reciprocity: An obser- 


vation by Helmholtz about the 
bidirectional reflectance distribution 


function fä, ê) of a local surface 


patch, where 7 and ë are the incom- 
ing and outgoing light rays respec- 
tively. The observation is that the 
reflectance is symmetric about the 
incoming and outgoing directions, i.€., 
fi, ® = f-@, i). More generally, the 
principle states that a ray of light 
and its reverse have identical influ- 
ences through an optical system which 
includes reflections, refractions and 
absorptions in a passive medium or at 
an interface. [FP03:4.2.2] 


Helmholtz stereopsis: A surface recon- 
struction approach that attempts 
to exploit the symmetry of sur- 
face reflectance. Consider taking two 
images where the first is of the 
object illuminated by a point light 
source and the second swaps the light 
source and camera positions. Because 
of Helmholtz reciprocity, for corre- 
sponding pixels in the two images, the 
ratio of incident irradiance (onto the 
object) to emitted radiance (from the 
object) is the same. Depth and surface 
normals may be derived from the inten- 
sities of corresponding pixels indepen- 
dent of the bidirectional reflectance 
distribution function of the surface: 
[ZBK02] 


Step 1 Step 2 


Eye point 1 Point light 1 Point light 2 Eye point 2 


Hessian: The matrix of second deriva- 
tives of a multivalued scalar function. 
It can be used to design an orientation- 
dependent second derivative edge 
detection system: [FP03:3.1.2] 
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heterarchical/mixed control: An 


approach to system control where 
control is shared amongst several 
systems. [DCM88] 


heterogeneous sensor network: A col- 
lection of sensors, often low band- 
width and mobile, of varying types that 
aggregate data in an environment. 


heuristic search: A process that 
employs commonsense rules (heuris- 
tics) to speed up searching. [BB82:4.4] 


hexagonal image representation: An 
image representation where the pixels 
are hexagonal rather than rectangu- 
lar. This representation might be used 
because it is similar to the human retina 
or because the distances to all adjacent 
pixels are equal, unlike with diagonally 
connected pixels in rectangular grids 
[WA89] 
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hidden Markov model (HMM): An 
HMM is a form of state space model 
where the hidden state is discrete. 


$ 


There is a hidden Markov chain evolv- 
ing according to a state transition 
probability distribution and observa- 
tions giving partial information about 
the hidden state are made according 
to the observation probability distri- 
bution. The figure shows a directed 
graphical model representation of the 
HMM, where Z; are the hidden state 
variables, X; are the observation vari- 
ables and A is the transition probabili- 
ties to the new state: 


Zi Z2 Z3 ZN 
A A 


X] X2 X3 XN 


HMMs have been used extensively in 
vision tasks such as activity analysis and 
handwritten character recognition. 
[Bis06:13.2] 


hidden state variable: A state space 
model consists of a state variable evolv- 
ing over time, and observations related 
stochastically to the state. If the state 
variable is not directly observable it 
is called a hidden state variable. In a 
hidden Markov model the state variable 
is discrete; for the Kalman filter, it is a 
continuous-valued vector. [Bis06:13.1] 


hierarchical: A general term referring 
to the approach of considering data at 
a low level of detail initially and then 
gradually increasing the level of detail. 
This approach often results in better 
performance. [WP:Hierarchy] 


hierarchical clustering: An approach 
to clustering or agglomerative 
clustering in which each item is 
initially put in a separate cluster. The 
two most similar clusters are merged 
and this merging is repeated until 
some condition is satisfied (e.g., no 
clusters of less than a particular size 
remain). [DH73:6.10] 


hierarchical coding: Coding of (image) 
data at multiple layers starting with 
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the lowest level of detail and grad- 
ually increasing the resolution. See 
also hierarchical image compression. 
[CLP85] 


hierarchical hidden Markov model: 
A hierarchical extension of the hidden 
Markov model that models sequences 
with structure at many length or time 
scales. [Mur12:15.6.2] 


hierarchical Hough transform: A 
technique for improving the efficiency 
of the standard Hough transform. Com- 
monly used to describe any Hough- 
based technique that solves a sequence 
of problems beginning with a low- 
resolution Hough space and proceed- 
ing to high-resolution space, or using 
low-resolution images, or operating on 
subimages of the input image before 
combining the results. [QM89] 


hierarchical image compression: 
Image compression using hierarchical 
coding. This leads to the concept 
of progressive image transmission. 
[Sha92] 


hierarchical k-means: A divisive 
approach to hierarchical clustering. 
Initially k-means clustering is run on all 
the data with small k (e.g., k = 2) and 
each partition so created is divided 
again by k-means. This process is 
continued recursively until a stopping 
criterion is applied. The method 
is also known as tree-structured 
vector quantization. See also divisive 
clustering. [NSO6] 


hierarchical matching: Matching at 
increasingly greater levels of detail. 
This approach can be used when 
matching images or more abstract rep- 
resentations. [FSO7] 


hierarchical model: A model formed of 
submodels, each of which may have 
further smaller submodels. The model 
may contain multiple instances of the 
subcomponent models. The subcom- 
ponents may be placed relative to 
the model using a coordinate system 
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hierarchical recognition: See 
hierarchical matching. [WP:Cognitive_ 
neuroscience_of_visual_object_ 
recognition#Hierarchical_ 
Recognition_Processing] 


hierarchical texture: A way of consid- 
ering texture elements at multiple lev- 
els (e.g., basic texture elements may 
themselves be grouped together to 
form a texture element at another 
scale, and so on). [BB82:6.2] 


hierarchical thresholding: A 
thresholding technique where an 
image is considered at different levels 
of detail in a pyramid data structure, 
and thresholds are identified at differ- 
ent levels in the pyramid starting at 
the highest level. [SHB08:5.1.4] 


high dynamic range imaging 
CHDR/HDRD: Imaging method 
that allows the capture of a much 
wider span (i.e., dynamic range) of 
intensities across images that span 
multiple areas of contrast - e.g., bright 
sky and a relatively dark landscape. 
This technique can be achieved by 
blending multiple images at different 
exposure levels to create a scene or a 
specialist camera. [Sze10:10.2] 


high-level vision: A general term refer- 
ring to image analysis and under- 
standing tasks (i.e., those tasks that 
address reasoning about what is seen, 
as opposed to basic processing of 
images). [BKKP05:5.10] 


high-pass filter: A frequency domain 
filter that removes or suppresses 


transformation or may just be listed 
in a set structure. The figure shows 
a three-level hierarchical model with 
multiple usage of the subcomponents: 
[WP:Hierarchical_database_model] 
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all low-frequency components. 
[(Umb98:2.5.4] 


higher-order graph cut: An undirected 
graphical model (UGM) with clique 
sizes greater than two is known as a 


higher-order UGM. The graph cut algo- 
rithm can be used to carry out approx- 
imate inference in such a network. 
[RLKT10] 


highlight: See specular reflection. 


hinge loss function: The hinge loss for 
x € R is defined as H(x) = max(0, 1 — 
x). Used in training machine learning 
classifiers, typically support vector 
machines. [Bis06:Ch. 7] 


histogram: A representation of the fre- 
quency distribution of some values. 
See intensity histogram: [Low91:5.2] 
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histogram analysis: A general term 
describing a group of techniques 
that abstract information from 
histograms (e.g., determining the anti- 
mode/trough in a bimodal histogram 
for use in thresholding). [Car87] 


histogram dissimilarity measure: A 
dissimilarity metric between two 
histogram distributions. Converse to a 
similarity metric between the same dis- 
tributions. [WZC05] 


histogram equalization: An image 
enhancement operation that processes 
a single image and results in an 
image with a uniform distribution of 
intensity levels Gi.e., whose intensity 
histogram is flat). When this technique 
is applied to a digital image, however, 
the resulting histogram will often have 
large values interspersed with zeros: 
[Low91:5.3] 


histogram modeling: A class of tech- 
niques, such as histogram equalization 
that modify the dynamic range and 


contrast of an image by changing its 
intensity histogram into one with the 
desired properties. [DN98] 


histogram modification: See 
histogram modeling. 


histogram moment: A moment 
derived from a histogram. [WP: 
Algorithms_for_calculating_ 


variance# Higher-order_statistics] 


histogram of local appearance con- 
text (HLAC): Context-based shape 
descriptor that uses local appearance 
descriptors that are distinctive and 
resilient to noise. A histogram of these 
features is constructed to form the final 
descriptor. [SBHO9] 


histogram of oriented gradients 
(HOG) descriptor: A feature descrip- 
tor based on a histogram of local 
gradient orientation spatially aggre- 
gated over a set of blocks, each in turn 
sub-divided into cells, within the local 
image neighborhood. Not invariant to 
rotation. [DT05] 


histogram of shape context (HOSC): 
A method of describing and comparing 
shapes that uses features measured in a 
relative coordinate system to allow for 
a degree of invariance. The coordinate 
system used is usually log-polar. [AT04] 


histogram smoothing: The applica- 
tion of a smoothing filter (e.g., 
Gaussian smoothing) to a histogram. 
This is often required before histogram 
analysis operations can be applied: 
[ADAO9] 
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hit and miss (hit or miss) operator: 
A mathematical morphology operation 
where a new image is formed by using 
logical AND on corresponding bits for 
every pixel of an input image and a 
structuring element. This operator is 
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most appropriate for binary images 
but may also be applied to gray scale 
images: [WP:Hit-or-miss_transform] 


Q) a 
HK segmentation: See mean 
and Gaussian curvature shape 
classification. [WP:Gaussian_ 


curvature] 


HMM: See hidden Markov model. 


holography: The process of creating a 
three dimensional image (a hologram) 
by recording the interference pattern 
produced by coherent laser light that 
has been passed through a diffraction 
grating. [WP:Holography] 


homogeneous, homogeneity: See 
homogeneous coordinates and 
homogeneous texture. 


homogeneous coordinates: Points 
described in projective space. 
For example an (x,y, 2 point in 
Euclidean space would be described 
as (Ax, ày, Àz, à) for any à in homo- 
geneous coordinates. Homogeneous 
quantities such as points are equal if 
they are scalar multiples of each other. 
For example a 2D point is represented 
as (x,y) in Cartesian coordinates 
and in homogeneous coordinates by 
the point (x, y, 1) and any multiple 
thereof. [FP03:2.1.1] 


homogeneous representation: A rep- 
resentation defined in projective 


homography transformation: Any 
invertible linear transformation 
between projective spaces. It is com- 
monly used for image transfer, which 
maps one planar image or region to 
another. The transformation can be 
estimated using four non-collinear 
point pairs. [WP:Homography] 


homomorphic filtering: An image 
enhancement technique that simul- 
taneously normalizes brightness and 
enhances contrast. It works by apply- 
ing a high-pass filter to the origi 
nal image in the frequency domain, 
hence reducing intensity variation 
(which changes slowly) and highlight- 
ing reflection detail (which changes 
rapidly). [Umb98:3.4.4] 


homoscedastic noise: Noise which is 
independent of the variable, sample 
and signal with a normal distribution 
and a constant variance. [KBL04] 


homotopic transformation: A con- 
tinuous deformation that preserves 
the connectivity of object features 
(e.g., skeletonization). Two objects are 
homotopic if they can be made the 
same by some series of homotopic 
transformations. [SHBO8:11.5.1] 


Hopfield network: A form of neural 
network that is an undirected graphical 
model with pairwise potentials. Hop- 
field networks have been used as 
associative networks and to solve 
optimization problems. [Mac03:Ch. 
42] 


horizon line: The line defined by all 
vanishing points from the same plane. 
The most commonly used horizon 
line is that associated with the ground 


space. [HZ00:1.2.1] 


homogeneous texture: A two- 
dimensional (or higher) pattern, 
defined on a space $ C R? for which 
some functions (e.g., mean and stan- 
dard deviation) applied to a window 
on S have values that are independent 
of the position of the window. [WP: 
Homogeneous_space] 


homography: The relationship 
described by a homography 


plane: [WP:Horizon#Geometrical_ 
model] 
Vanishing Vanishing 
Point Point 
Horizon Line Horizon Line 


Hough forest: An object detection 
method extending the Hough 


transform such that the part detec- 


transformation. [WP:Homography] 
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tions of an individual object to cast 


probabilistic votes for the location of 
the object centroid. The detection 
hypotheses (i.e., potential objects) 
then correspond to the maxima of the 
Hough image, which accumulates the 
votes from all parts. A class-specific 
random forest maps local image patch 
appearance to a probabilistic vote on 
the object centroid. [GYR+11] 


Hough transform: A technique for 
transforming image features directly 
into the likelihood of occurrence of 
some shape. See Hough transform 
line finder and generalized Hough 
transform. [Low91:9.3] 


Hough transform line finder: A ver- 
sion of the Hough transform based 
on the parametric equation of a line 
(s = i cos + j sin 0) in which a set of 
edge points {(z, j)} is transformed into 
the likelihood of a line being present 
as represented in a (s,@) space. The 
likelihood is quantified, in practice, by 
a histogram of the sin, cos values 
observed in the images: [Low91:9.3.1] 


ie A 
HSI: Hue-saturation-intensity 
image format. [JKS95:10.4] 


color 


HSL: Hue-saturation-luminance color 
image format. [WP:HSL_and_HSV] 


Color Image 


Saturation Luminance 


= 


HSV: Hue-saturation-value color image 
format. [FP03:6.3.2] 


Hu moment: A set of seven 
rotation invariant, scale invariant 
and translation invariant 


shape moments formed by combina- 
tions of central moments. [SB11:9.4] 


hue: Describes color using the dominant 
wavelength of the light. Hue is a com- 
mon component of color image for- 
mats. [FP03:6.3.2] 


Hueckel edge detector: A parametric 
edge detector that models an edge 
using a parameterized model within a 
circular window (the parameters are 
edge contrast, edge orientation and 
distance background mean intensity). 
[Dav75] 


Huffman encoding: An optimal, 
variable-length encoding of values 
(e.g., pixel values) based on the 
relative probability of each value. The 
code lengths may change dynamically 
if the relative probabilities of the data 
source change. This technique is com- 
monly used in image compression. 
[Low91:15.3] 


human-computer interaction: The 
study of the methods and practices of 
interaction between human users and 
computer systems. Often this study is 
undertaken in order to better design 
hardware and software interfaces or 
to improve workflow when using soft- 
ware. 


human motion analysis: A general 
term describing the application of 
motion analysis to human subjects. 
Such analysis is used to track moving 
people, to recognize the pose of a per- 
son and to derive 3D properties. [WP: 
Motion_analysis#Human_motion_ 
analysis] 


human motion tracking: Computing 
the location of human subjects in 
video sequences. There are two typi- 
cal approaches: model-based and non- 
model-based, depending on whether 
predefined shape models are used. The 
representation used is often that of a 
stick figure. See model-based tracking 
and tracking. 


human pose estimation: Image-based 
goniometric analysis of human joint 
poses. This is usually done following 
a skeleton-fitting from human motion 
tracking: [VMZ08] 
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HYPER: HYpothesis Predicted and Evalu- 
ated Recursively. A well-known vision 
system developed by Nicholas Ayache 
and Olivier Faugeras, in which geo- 
metric relations derived from polygo- 
nal models are used for recognition. 
[AF86] 


hyperbolic surface region: A region 
of a 3D surface that is locally saddle- 
shaped. A point on a surface at which 
the Gaussian curvature is negative (so 
the signs of the principal curvatures are 
opposite): [EF97] 


hyperfocal distance: The distance D at 
which a camera should be focused in 
order that the depth of field extends 
from D/2 to infinity. Equivalently, if 
a camera is focused at a point at dis- 
tance D, points at D/2 and infinity are 
equally blurred. [JKS95:8.3] 


hyperparameter: In a Bayesian 
statistical model, the prior distribution 
for the model parameters 0 may 
depend on some parameters ¢; these 
are known as hyperparameters, to 
distinguish them from the model 
parameters. [GCSR95:Ch. 5] 


hyperplane: A geometrical construct 
which extends the idea of a plane 
in three dimensions to a general 
d-dimensional space. It is the set 
of points X = (x1, X2, ..., X4)! satisfy- 
ing the equation ax, + ...+ dgXq = C 
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where @ is a fixed vector and c is a 
constant. [Nob69:p. 183] 


hyperquadric: A class of volumetric 
representations that include 
superquadrics. Hyperquadric mod- 
els can describe arbitrary convex 
polyhedra. [SQ04:9.11] 


hyperspectral image: An image with a 
large number (perhaps hundreds) of 
spectral bands. An image with a lower 
number of spectral bands is referred 
to as multi-spectral image. [WP: 
Hyperspectral_imaging] 


hyperspectral sensor: A sensor capa- 
ble of collecting many (perhaps hun- 
dreds) of spectral bands simultane- 
ously. Produces a hyperspectral image. 
[WP:Hyperspectral_imaging] 


hypothesis testing: Testing of an 
assumption about a parameter which 
may or may not be true. This could 
be performed by examining the entire 
population but that is often impractical 
so a sample is tested. If this sample is 
consistent with the hypothesis then it 
is accepted, else it is rejected. See also 
RANSAC. [Bar12:12.6] 


hypothesize and test: See hypothesize 
and verify. 


hypothesize and verify: A common 
approach to object recognition in 
which possibilities (of object type and 
pose) are hypothesized and then evalu- 
ated against evidence from the images. 
This is done either until all possibili- 
ties are considered or until a hypothe- 
sis with a sufficiently high degree of fit 
is found: [JKS95:15.1] 


Possible hypotheses: 


What piece 
goes here? 
[2k Hypotheses which do not need to be 


considered (in this 3 by 3 jigsaw): 


hysteresis tracking: See thresholding 
with hysteresis. 


ICA: See independent component 
analysis. [WP:Independent_comp- 
onent_analysis] 


iconic: Having the characteristics of an 
image. See iconic model. [SQ04:4.1.1] 


iconic model: A representation having 
the characteristics of an image. For 
example the template used in template 
matching. [SQ04:4.1.1] 


iconic recognition: Object recognition 
using an iconic model. [FH05] 


ICP: See iterative closest point. 


ideal line: A line described in the con- 
tinuous domain as opposed to one in 
a digital image, which will suffer from 
rasterization. [HZ00:1.2.2] 


ideal point: A point described in the con- 
tinuous domain as opposed to one in 
a digital image, which will suffer from 
rasterization. May also be used to refer 
to a vanishing point. [HZ00:1.2.2] 


IDECS: Image discrimination enhance- 
ment combination system. A well- 
known vision system developed by 
Haralick and Currier. [HC77] 


identification: The process of associat- 
ing some observations with a particu- 
lar instance or class of object that is 
already known. [TV98:10.1] 


identity verification: Confirmation of 
the identity of a person based on some 
biometrics (e.g., face authentication). 
This differs from the recognition of 
an unknown person in that only one 
model has to be compared with the 
information that is observed. [BAM99] 


IGS: Interpretation guided segmentation. 
A vision technique for grouping image 
elements into regions based on seman- 
tic interpretations in addition to raw 


image values. Developed by Tenen- 
baum and Barrow. [TB77] 


IHS: Intensity hue saturation color image 
format. [BB82:2.2.5] 


IIR: See infinite impulse response filter. 
[WP:Infinite_impulse_response] 


ill-posed problem: A mathematical 
problem that infringes at least one of 
the conditions in the definition of well- 
posed problem. Informally, these are 
that the solution must exist, be unique 
and depend continuously on the data. 
Ill-posed problems in computer vision 
have been approached using regu- 
larization theory. See regularization. 
[SQ04:6.2.1] 


illuminance: The total amount of vis- 
ible light incident upon a point on 
a surface. Measured in lux Cdumens 
per meter squared), or footcandles 
(umens per foot squared). Illuminance 
decreases as the distance between 
the viewer and the source increases. 
[JKS95:9.1.1] 


illuminant direction: The direction 
from which illuminance originates. See 
also light source geometry. [TV98:9.3] 


illumination: See illuminance. 


illumination constancy: The phe- 
nomenon that allows humans to per- 
ceive the lightness (or brightness) 
of surfaces as approximately constant 
regardless of the illuminance. [HS94] 


illumination estimation: Methods for 
computing the location and direction 
of light sources in an image using shad- 
ing, shadow and specular reflections. 
[LLLSO3] 


illumination field calibration: Deter- 
mination of the illuminance falling on 
ascene. Typically this is done by taking 
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an image of a white object of known 
brightness. [HS96] 


illumination model: An expression of 
the components of light reflected from 
or refracted through a surface. There 
are three basic components to the 
standard illumination model: ambient 
light, diffuse illumination, and specular 
reflection. [FDFH96:Ch. 16] 


illusory contour: A perceived border 
where there is no edge present in 
the image data. See also subjective 
contour. The figure shows the Kanizsa 
triangles: [FP03:14.2] 


image: A function describing some quan- 
tity (such as brightness) in terms of spa- 
tial layout (see registration). Most fre- 
quently, computer vision is concerned 
with two-dimensional digital images. 
[SB11:1.1] 


image addition: See pixel addition 
operator. 


image alignment: The process of spa- 
tially matching (and aligning) one 
image, sometimes called a template 
image, with another image such that 
the transformation between them can 
be estimated. Also known as image 
registration. [SB11:Ch. 7] 


image analysis: A general term covering 
all forms of analysis of image data. Gen- 
erally image analysis operations result 
in a symbolic description of the image 
contents. [Jai89:1.5] 


image annotation: The process of docu- 
menting the object contents or context 
of an image. See annotation. 


image acquisition: See image capture. 
image arithmetic: A general term 
covering image-processing operations 
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that are based on the application of 
an arithmetic or logical operator to 
two images. See also pixel addition 
operator, pixel subtraction operator, 
pixel multiplication operator, pixel 
division operator, image blending, 
AND operator, NAND operator, OR 
operator, XOR operator and XNOR 
operator. [SB11:3.2] 


image based: A general term describing 
operations or representations that are 
based on images. [WP:Image_analysis] 


image-based lighting: Image rendering 
technique which uses specialized hard- 
ware to capture a 3D wrap-around 
view of the lighting surrounding an 
object and then compute the light- 
ing values for the scene based on the 
brightness of this view. [RWPD05] 


image-based modeling: 3D modeling 
approach that takes a single image as 
input and represents the scene as a lay- 
ered collection of depth images which 
it then attempts to compute, usually 
with user assistance. [OCDD01] 


image-based rendering: Techniques 
used to render novel views directly 
from input images without knowing 
the full 3D geometry. See also view 
interpolation. [Sze10:13.1] 


image blending: An image arithmetic 
operation similar to that using the pixel 
addition operator where a new image 
is formed by blending the values of 
corresponding pixels from two input 
images. Each input image is given a 
weight for the blending so that the total 
weight is 1.0: [SB11:3.2.1.1] 


l iah goeg 


image capture: The acquisition of an 
image by a recording device, e.g., a 
camera. [TV98:2.3] 


image classification: An image 
segmentation approach where all pix- 
els in an image are placed into a finite 
number of sets or classes depending 
on classification criteria. An example 
use is that of the classification of land 
types in satellite imagery. [HSD73] 


image coding: The mapping or algo- 
rithm required to encode or decode 
an image representation (such as a 
compressed image). [WP:Graphics_ 
Interchange_Format#Image_coding] 


image compression: A method of rep- 
resenting an image in order to reduce 
the amount of storage space that it 
occupies. Techniques can be lossless 
(which allows all image data to be 
recorded perfectly) or lossy (where 
some loss of quality is allowed, typ- 
ically resulting in significantly better 
compression rates). [SB11:1.3.2] 


image connectedness: See 
connectivity. 


pixel 


image coordinates: See image plane 
coordinates and pixel coordinates. 


image database indexing: The tech- 
nique of associating an index (e.g., key- 
words) with images, which allows the 
images to be indexed efficiently within 
a database. [SQ04:13A.3] 


image decomposition: A general term 
referring to the separation of an 
image into its constituent parts follow- 
ing an established set of basis func- 
tions (e.g., Fourier transform, wavelet 
transform, DCT or similar), a given 
set of color channels (e.g., RGB, HSV) 
or segmentation (including semantic 
scene segmentation). [Bov05:p. 973, 
p. 354] 


image denoising: The process of remov- 
ing corruption in an image (caused 
by noise). There are three broad cat- 
egories of approach: those related to 
anisotropic diffusion or smoothing; 
those exploiting natural image statis- 
tics learned over a training set; sam- 
pling techniques that combine infor- 
mation from several image patches that 
are similar in appearance to the one 
under examination. The figure shows 
denoising by smoothing: [CYV00] 


Noisy image After denoising 


image descriptor: A set of short vec- 
tors derived from an image, which are 
invariant to common transformations, 
providing a summary description that 
can be compared with other descrip- 
tors in a database to obtain matches 
according to some distance metric. 
Common examples of such descriptors 
are SIFT and SURF. [Sze10:4.1.2] 


image difference: See pixel subtraction 
operator. 


image digitization: The process of 
sampling and quantization of an ana- 
log image function to create a digital 
image. [Nal93:2.3.1] 


image distortion: Any effect that alters 
an image from the ideal image. 
Most typically this term refers to 
geometric distortions, although it can 
also refer to other types of distor- 
tion such as image noise and effects 
of sampling and quantization: [WP: 
Distortion_(optics)] 


Correct Image Distorted Image 


CER 


image encoding: The process of 
converting an image into a dif- 
ferent representation. For example, 
see image compression. [WP:Image_ 
compression] 


image enhancement: A general 
term covering a number of 
image-processing operations that 
alter an image in order to make it eas- 
ier for humans to perceive. Example 
operations include contrast stretching 
and histogram equalization. The 
figure shows a histogram equalization 


operation: [SB11:Ch. 4] 
ma 
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image epitome: A useful data-mining 
representation of an image which con- 
tains abbreviated statistical properties 
rather than the data itself. [JFK03] 
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image feature: A general term for an 
interesting image structure that could 
arise from a corresponding interesting 
scene structure. Features can be single 
points such as interest points, curve 
vertices, edges, lines, curves, surfaces 
etc. [TV98:4.1] 


image feature extraction: A group 
of image-processing techniques con- 
cerned with the identification of partic- 
ular image features. Examples include 
edge detection and corner detection. 
[TV98:4.1] 


image flow: See optical flow. 


image formation: A general term cov- 
ering issues relating to the manner in 
which an image is formed. For exam- 
ple in the case of a digital camera this 
term would include the camera geom- 
etry as well as the process of sampling 
and quantization. [SB11:2.1] 


image gallery: A collection of images 
used for statistical, texture or feature 
analysis or some other comparison. 


image grid: A geometric map describ- 
ing the image sampling in which every 
image point is represented by a vertex 
(or hole) in the map or grid. [FHO4] 


image indexing: See image database 
indexing. 


image intensifier: A device for ampli- 
fying an image, so that the resultant 
sensed luminous flux is significantly 
higher. [WP:Image_intensifier] 


image interleaving: Describes the way 
in which image pixels are organized, 
e.g., pixel interleaving (where the 
image data is ordered by pixel posi- 
tion) and band interleaving (where the 
image data is ordered by band and is 
then ordered by pixel position within 
each band). [WP:Interleaving] 


image interpolation: A method for 
computing a value for a pixel in 
an output image based on non- 
integer coordinates in some input 
image. The computation is based 
on the values of nearby pixels in 
the input image. This type of opera- 
tion is required for most geometric 
transformations and computations 
requiring subpixel interpolation. 
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Types of interpolation scheme 
include nearest-neighbor interpola- 
tion, bilinear surface interpolation, 
bicubic interpolation etc. The figure 
shows the result of interpolation in 
image enlargement: [Sch89:Ch. 2] 


Enlarged image using bicubic 
interpolation 


image interpretation: A general term 
for computer-vision processes that 
extract descriptions from images (as 
opposed to processes that produce 
output images for human viewing). 
There is often the assumption that the 
descriptions are very high-level, e.g., 
“the boy is walking to the store carry- 
ing a book” or “these cells are cancer- 
ous”. A broader definition would also 
allow processes that extract informa- 
tion needed by a subsequent (usually 
non-image-processing) activity, e.g., 
the position of a bright spot in an 
image. [MZ89] 


image invariant: An image feature or 
measurement image that is invari 
ant to some properties. For exam- 
ple invariant color features are often 
used in image database indexing. [WP: 
Image_moment] 


image irradiance equation: Usually 
expressed as E(x, y) = R(p, q), this 
equality (up to a constant scale factor 
to account for illumination strength, 
surface color and optical efficiency) 
says that the observed brightness E at 
pixel (x, y) is equal to the reflectance 
R of the surface for surface normal 
(p,q, —1). Usually there is a one- 
degree-of-freedom family of surface 
normals with the same reflectance 
value so the observed brightness only 
partially constrains local surface orien- 
tation and thus shape. [JKS95:9.3.1] 


image magnification: The extent to 
which an image is expanded for 
viewing. If the image size is actu- 
ally changed then image interpolation 
must be used. Normally quoted relative 
to the original size (e.g., x2, x 10 etc.): 
[Jai89:7.4] 


Magnified image (x4) 


image matching: 1) The comparison 
of two images, often evaluated using 
cross correlation. See also template 
matching: [TV98:10.4.2] 


Locations where Image 2 
Image 2 matches Image 1. 


HULU 


2) Finding images in a database that 
correspond in some way to a given 
exemplar. [Sze10:9.2] 


Image 1 


image memory: See frame store. 


image modality: A general term for the 
sensing technique used to capture an 
image, e.g., visible light, infrared light 
or X-ray. [KHO7] 


image morphing: A gradual transforma- 
tion from one image to another: [WP: 
Morphing] 


image morphology: An approach to 
image processing that considers all 
operations in terms of set opera- 
tions. See mathematical morphology 
operation. [WP:Mathematical_morph- 
ology] 


image mosaic: A composition of several 
images, to provide a single larger image 


covering a wider field of view. The fig- 
ure shows a mosaic of three images: 
[Sch89:Ch. 2] 


image motion estimation: Compu- 
tation of optical flow for all pix- 
els or features in an image. [WP: 
Motion_estimation] 


image multiplication: See 
multiplication operator. 


pixel 


image noise: Degradation of an image 
where pixels have values which are 
different from the ideal values. Often 
noise is modeled as having a Gaus- 
sian distribution with a zero mean, 
although it can take on different forms 
such as salt-and-pepper noise depend- 
ing upon the cause of the noise (e.g., 
the environment or electrical interfer- 
ence). Noise is measured in terms of 
the signal-to-noise ratio: [SB11:2.3.3] 


Image with Gaussian 
Original Image Noise 


Image with Salt and 
Pepper Noise 


image normalization: Reducing or 
eliminating the effects of different 
illumination on the same or similar 
scenes. A typical approach is to sub- 
tract the mean of the image and divide 
by the standard deviation, which 
produces a zero-mean, unit variance 
image. Since images are not Gaussian 
random samples, this approach does 
not completely solve the problem. Fur- 
ther, light source placement can also 
cause variations in shading that are 
not corrected by this approach. The 
figure shows an original image (left) 
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and its normalization (right): [WP: 
Normalization_Gmage_processing)] 


image of absolute conic: See absolute 
conic. 


image orthicon tube: Electron emission 
tube used in early television sets where 
the number of electrons produced in 
the tube corresponded with the inten- 
sity of secondary emission of electrons 
on the screen itself. [Abr03] 


image pair rectification: See image 
rectification. 


image parsing: Methods used to find a 
semantically meaningful label for every 
pixel in a given image. [TCYZ05] 


image plane: The mathematical plane 
behind the lens onto which an image is 
focused. In practice, the physical sens- 
ing surface aims to be placed here, but 
its position will vary slightly because of 
minor variations in sensor shape and 
placement. The term is also used to 
describe the geometry of the image 
recorded at this location: [JKS95:1.4] 


IMAGE PLANE OPTICAL AXIS 


image plane coordinates: The position 
of points in the physical image sens- 
ing plane. They have physically mean- 
ingful values, such as centimeters, that 
can be converted to pixel coordinates, 


results in an image, rather than end- 
ing with symbolic descriptions of the 
image contents or scene. [JKS95:1.2] 


image-processing operator: A func- 


tion that may be applied to an image 
in order to transform it in some way. 
See also image processing. [Dav90:2.2] 


image pyramid: A hierarchical 


registration in which each level 
contains a smaller version of the image 
at the previous level. Often pixel 
values are obtained by a smoothing 
process. Usually the reduction is by 
a power of two (i.e., 2 or 4. The 
figure shows four levels of a pyramid 
in which each level is formed by 
averaging together two pixels from 
the previous layer: 


The levels are enlarged to the original 
image size for inspection of the effect 
of the compression. [FP03:7.7] 


image quality: A general term, usu- 


ally referring to the extent to which 
the image data records the observed 
scene faithfully. The specific issues 
that are important to image quality are 
problem specific but may include low 
image noise, high image contrast, good 
image focus, low motion blur etc. [WP: 
Image_quality] 


image querying: A shorthand term 


for image database indexing. This 
is often based on color, texture or 
shape indices. The database keys 
could be based on global or local 
measures. [WP:Content-based_image_ 
retrieval#Query_techniques] 


image reconstruction: A term used in 


image compression to describe the 


which are in pixels. The two meanings 
are sometimes used interchangeably. 
[JKS95:1.6] 


image processing: A general term cov- 
ering all forms of processing of cap- 
tured image data. It can also mean pro- 
cessing that starts from an image and 
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process of recreating a digital image 
from some compressed form. [HL94] 


image rectification: A warping of a 


stereo pair of images such that con- 
jugate epipolar lines (defined by the 
two cameras’ epipoles and any 3D 
scene point) are collinear. Usually 


the lines are transformed to be par- 
allel to the horizontal axis so that 
corresponding image features can be 
found on the same raster line. This 
reduces the computational complexity 
of the stereo correspondence problem. 
[FP03:11.1.1] 


image registration: See registration. 


image representation: A general term 


for how the image data is represented. 
Image data can be in one, two, three or 
more dimensions. Image data is often 
stored in arrays where the spatial lay- 
out of the array reflects the spatial lay- 
out of the data. The figure shows a 
small 10 x 10 pixel image patch with 
the gray scale values for the corre- 
sponding pixels: [Jai89:1.2] 


123 123 123 123 123 123 123 123 96 96 
123 123 112 96 96123 123 123 123 96 
123 123 96 96 112 123 137 123 123 96 
123 123 96 96 123 214 234 178 123 96 
123 100 72 109 178 230 230 137 123 96 
125 78 51142 218178 96 76 96 96 
92100 92 92 81 76 76 96123123 
81109 129129100 81 92123 123 123 
$1 109 142 137 123 123 123 123 123 123 
33 76 123 123 137 137 123 123 123 123 


image resolution: Usually used to 


record the number of pixels in the 
horizontal and vertical directions in an 
image but may also refer to the separa- 
tion between pixels (e.g., 1 um) or the 
angular separation between the lines of 
sight corresponding to adjacent pixels. 
[SB11:1.2] 


image restoration: The process of 


removing some known (and modeled) 
distortion from an image, such as blur 
in an out-of-focus image. The process 
may not produce a perfect image, 
but may remove an undesired distor- 
tion (e.g., motion blur) at the cost 
of another ignorable distortion (e.g., 
phase distortion). [SB11:Ch. 6] 


image retrieval: Search method for 


comparing images in a database with 
an exemplar (either an image or some 
text) and then presenting those images 
which most closely match the given 
criteria. [Sze10:Ch. 14] 


image sampling: The process of mea- 


suring some pixel values from the 
physical image focused onto the 
image plane. The sampling could be 
monochrome, color or multi-spectral, 


such as RGB. The sampling usually 
results in a rectangular array of pixels 
sampled at nearly equally spacing but 
other sampling could be used, such as 
a space-variant sensor. [Nal93:2.3.1] 


image scaling: The operation of increas- 


ing or reducing the size of an image by 
some scale factor. This operation may 
require the use of some type of image 
interpolation method. See also image 
magnification. [WP:Image_scaling] 


image segmentation: The grouping of 


image pixels into meaningful, usually 
connected, structures such as curves 
and regions. The term is applied to 
a variety of image modalities, such 
as intensity data or range data and 
properties, such as similar feature 
orientation, feature motion, surface 
shape or texture. [SB11:10.1] 


image semantics: A general term refer- 


ring to the meaning or understanding 
of image content. See semantic scene 
understanding [SWS+00] 


image sequence: A series of images gen- 
erally taken at regular intervals in time. 
Typically the camera or the objects in 
the scene (or all of them) will be mov- 
ing: [TV98:8.1] 


image sequence analysis: Techniques 


for merging, segmenting or under- 
standing image content which utilize 
the temporal aspect of the image con- 
tent (e.g., change detection). One par- 
ticular area of usage is in compres- 
sion algorithms such as MPEG-4 and 
behavior analysis. 


image sequence fusion: The integra- 


tion of information from the many 
images in an image sequence. Different 
types of fusion include 3D structure 
recovery, production of a mosaic of 
the scanned scene, tracking of a mov- 
ing object, improved scene imaging 
because of average smoothing etc. 
[WP:Image_fusion] 
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image sequence matching: Comput- 
ing the correspondence between pix- 
els or image features in frames of the 
image sequence. With the correspon- 
dences, one can construct an image 
mosaic, carry out image stabilization 
(to remove jitter) or recover scene 
structure. [Moh98] 


image sequence stabilization: Hand- 
held video camera recordings contain 
some image motion because of jitter 
from the human operator. Image stabi- 
lization attempts to estimate the ran- 
dom portion of the camera motion 
jitter and translate the images in the 
sequence to reduce or remove the jit- 
ter. A similar application would be to 
remove systematic camera motions to 
produce a motionless image. See also 
feature stabilization and translation. 
[WP:Image_stabilization] 


image sharpening operator: An image 
enhancement operator that increases 
the high spatial frequency component 
of the image, so as to make the edges of 
objects appear sharper or less blurred. 
See also edge enhancement. The figure 
shows a raw image (left) and an image 
sharpened with the unsharp operator 
(right): [SB11:4.6] 


image size: The number of pixels in an 
image, e.g., 768 horizontally by 494 
vertically. [WP: Wikipedia: What_is_ 
a_featured_picture%3F/Image_size] 


image smoothing: See noise reduction. 


image stabilization: See image 
sequence stabilization [WP:Image_ 
stabilization] 


image stitching: Methods of combin- 
ing multiple images taken with differ- 
ent orientations or geometry into a sin- 
gle panoramic image mosaic. The fig- 
ure is composed from 12 images with 
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rotational and translational motion 
between them: [Sze10:9.3.3] 


image storage devices: See frame store. 


image subtraction operator: See pixel 
subtraction operator. 


image tagging: Automatic or manual 
assignment of relevant keywords to 
images in a collection in order to aid 
retrieval. See annotation. 


image texton: Proposed elementary 
units of texture perception which are 
somewhat analogous to the phonemes 
used in speech recognition. [MBSL99] 


image transfer: 1) See novel view 
synthesis. 
2) A general term describing the 
movement of an image from one 
device to another or from one rep- 
resentation to another. [WP:Picture_ 
Transfer_Protocol] 


image transform: An operation on an 
image that produces another image. 
The new image may have changed 
geometry from the original or may 
contain new, derived information. The 
usual purpose of applying such a 
transformation is to enhance or make 
explicit some desired information. 
[SB11:Chs 3, 5 & 7] 


image understanding: A general term 
referring to the derivation of high-level 
(abstract) information from an image 
or series of images. This term is often 
used to refer to the emulation of human 
visual capabilities. [Jai89:9.15] 


Image Understanding Environment 
(UE): A C++-based collection of data 
types (classes) and standard com- 
puter vision algorithms. The motiva- 
tion behind the development of the 
IUE was to reduce the independent 
re-invention of basic computer vision 
code in government-funded research 
into computer vision. [HR92] 


image warping: A general term for 
transforming the positions of pixels 
in an image, usually while maintain- 
ing image topology (i.e., neighboring 
original pixels remain neighbors in the 
warped image). This results in an image 
with a new shape. This operation 
might be done, e.g., to correct some 
geometric distortion, align two images 
(see image rectification), or transform 
shapes into a more easily processed 
form (e.g., circles into straight lines). 
[SB11:7.10] 


image watermarking: See digital image 
watermarking. 


imaging geometry: A general term 
referring to the relative placement of 
sensors, structured light sources, point 
light sources etc. [BB82:2.2.2] 


imaging spectroscopy: The acquisi- 
tion and analysis of surface composi- 
tion by using image data from multi- 
ple spectral channels. A typical sensor 
(AVIRIS) records 224 measurements at 
10 nm increments from 400 nm to 
2500 nm. The term might refer to the 
raw multidimensional signal or to the 
classification of that signal into sur- 
face types (e.g., vegetation or mineral 
types). [WP:Imaging spectroscopy] 


imaging surface: The surface within a 
camera on which the image is pro- 
jected by the lens. This surface in a 
digital camera is composed of pho- 
tosensitive elements that record the 
incident illumination. See also image 
plane. [PBP01] 


implicit curve: A curve that is defined 
by an equation of the form f(x) = 0. 
Then the curve is the set of points S$ = 
{x| f@) = 0}. [FP03:15.3.1] 


implicit surface: The representation of 
a surface as the set of points that makes 
a function have the value zero. For 
example, the sphere x? + y? + 2 =r 
of radius r at the origin could be rep- 
resented by the function f(x, y, D = 
X +y +27 — r’. The set of points 
where f(x, y, D = Ois the implicit sur- 
face. [SQ04:4.1.2] 


importance sampling: Consider the 
integral f fe@opC@odx. This can be 
estimated using Monte Carlo methods 


by drawing samples from p(x). How- 
ever, if it is hard to sample from p(x) 
we may sample from another distri- 
bution g(x), and reweight each sam- 
ple by a factor of w; = px)/qo), 
known as an importance weight. Sub- 
sequently, the integral is approximated 
as tye w; f(x), where the sam- 
ples x, ..., x are drawn from q(x). 
[BisOG6:11.1.4] 


impossible object: An object that can- 
not physically exist: [Nal93:4.1.1] 


impulse noise: A form of image cor- 
ruption where image pixels have their 
value replaced by the maximum value 
(e.g., 255). See also salt-and-pepper 
noise. The figure shows impulse noise 
on an image: [TV98:3.1.2] 


Pa 
zI: 


incandescent lamp: A light source 
whose light arises from the glowing 
of a very hot structure, such as the 
tungsten filament in a light bulb. [WP: 
Incandescent_light_bulb] 


incident light: A general term refer- 
ring to the light that strikes or 
illuminates a surface. [WP:Incident_ 
light#Interaction_with_surfaces] 


incident light measurement: Measur- 
ing the amount of light illuminating 
the subject rather than light reflected 
from the subject, thus ignoring the sub- 
ject and background characteristics. 
[AT10] 
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incremental learning: Learning 
that is incremental in nature. See 
continuous learning. [WP:Population- 
based_incremental_learning] 


independent component analysis 
(CA): A multi-variate data analysis 
method that finds a linear transfor- 
mation to make each component of 
the transformed data vectors indepen- 
dent of each other. Unlike principal 
component analysis, which considers 
only second-order properties (covari- 
ances) and transforms onto basis vec- 
tors that are orthogonal to each 
other, ICA considers properties of 
the whole distribution and trans- 
forms onto basis vectors that need 
not be orthogonal. [WP:Independent_ 
component_analysis] 


independent motion detection: Algo- 
rithmic approaches to the problem 
of detecting independently moving 
objects in a 3D scene when they are 
viewed by a moving camera which also 
causes all pixels to change. [SYAK99] 


index of refraction: The absolute index 
of refraction in a material is the ratio of 
the speed of an electromagnetic wave 
in a vacuum to the speed in the mate- 
rial. More commonly used is the rela- 
tive index of refraction of two media, 
which is the ratio of their absolute 
indices of refraction. This ratio is used 
in lens design and explains the bending 
of light rays as the light passes into a 
new material (Snell’s law). [FP03:1.2.1] 


indexing: The process of retrieving an 
element from a data structure using a 
key. A powerful concept imported into 
computer vision from programming. 
For example, the problem of estab- 
lishing the identity of an object given 
an image and a set of candidate mod- 
els is typically approached by locat- 
ing some characterizing elements, or 
features, in the image and using the fea- 
tures’ properties to index a database of 
models. See also model base indexing. 
[FP03:18.4.2] 


industrial vision: A general term cover- 
ing the application of machine-vision 
technology to industrial processes. 
Applications include product inspec- 
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tion, process feedback and part or tool 
alignment. A large range of lighting 
and sensing techniques are used. A 
common feature of industrial vision 
systems is fast processing rates (e.g., 
several times a second), which may 
require limiting the rate at which tar- 
gets are analyzed or limiting the types 
of processing. [MPZ+03] 


infinite impulse response filter (IIR): 
A filter that produces an output 
value (y) based on the current 
and past input values (x;) together 
wiih past output values (y;). Yn = 
P o UXni + bjIn-; Where a; 
and b; are weights. [WP:Infinite_ 
impulse_ response] 


inflection point: A point at which the 
second derivative of a curve changes 
its sign, corresponding to a change in 
concavity. See also curve inflection: 
[FP03:19.1.1] 


INFLECTION 
POINT 


influence function: A function describ- 
ing the effect of an individual observa- 
tion on a statistical model. This allows 
us to evaluate whether the observation 
is having an undue influence on the 
model. [WP:Influence_function] 


information fusion: Fusion of informa- 
tion from multiple sources. See sensor 
fusion. [WP:Information_integration] 


infrared: See infrared light. 


infrared imaging: Production of a 
image through use of an infrared 
sensor. [SB11:3.1] 


infrared light: Electromagnetic energy 
with wavelengths approximately in the 
range 700 nm to 1 mm. Immediately 
shorter wavelengths are visible light 
and immediately longer wavelengths 
are microwave radio. Infrared light is 
often used in machine vision systems 


because it is easily observed by most 
semiconductor image sensors yet is not 
visible to humans and because it is a 
measure of the heat emitted by the 
observed scene. [SB11:3.1] 


infrared sensor: A sensor capable of 
observing or measuring infrared light. 
[SB11:3.1] 


inlier: A sample that falls within an 
assumed probability distribution (e.g., 
within the 95th percentile). See also 
outlier. [WP:RANSAC] 


inpainting: The process of replacing 
degraded, corrupted or missing parts 
of an image or video often by using 
statistics-based texture synthesis algo- 
rithms. [Wei03] 


inspection: A general term for visually 
examining a target to detect defects. 
Common practical inspection exam- 
ples include printed circuit boards for 
breaks or solder joint failures, paper 
production for holes or discolorations, 
and food for irregularities. [SQ04:17.4] 


integer lifting: A method used to con- 
struct wavelet representations. [WP: 
Lifting_scheme] 


integer wavelet transform: An inte- 
ger version of the discrete wavelet 
transform. [CDSY98] 


integral image: A data structure, 
I, (and its associated construc- 
tion algorithm) for efficiently gen- 
erating the sum of values in an 
image, 7, over a given region of 
interest. Defined such that I(x, y) = 
ye <x, y <y 1’, V’). Computed in a sin- 
gle pass over an image. Also known as 
the summed area table. [Sze10:3.2.2] 


integral image compression: A three- 
dimensional integral image occupies 
a large amount of data. In order to 
present it with adequate resolution, 
integral image-compression algorithms 
(e.g., using MPEG) take advantage of 
the image’s modality-specific charac- 
teristics. [YSJ04] 


integral invariant: An integral (of some 
function) that is invariant under a set 
of transformations. For example, local 
integrals along a curve of curvature or 
arc length are invariant to rotation and 


translation. Integral invariants poten- 
tially have greater stability to noise 
than, e.g., differential invariants such 
as curvature. [MHYS04] 


integration time: The length of time 
that a light-sensitive sensor medium is 
exposed to the incident light (or other 
stimulus). Shorter times reduce the sig- 
nal strength and possible motion blur 
Gf the sensor or objects in the scene 
are moving). [NB04] 


intensity: 1) The brightness of a light 
source. 
2) Image data that records the bright- 
ness of the light that comes from the 
observed scene. [TV98:2.2.3] 


intensity-based database indexing: 
This is a form of image database 
indexing that uses intensity descrip- 
tors such as histograms of pixel values 
(monochrome or color) or vectors of 
local derivative values. [SSTGO3] 


intensity cross correlation: Cross 
correlation using intensity data. 
[XJM94] 


intensity data: Image data that repre- 
sents the brightness of the measured 
light. There is not usually a linear map- 
ping between the brightness of the 
measured light and the stored values. 
The term can also refer to the intensity 
of observed visible light. [SWP+83] 


intensity gradient: The mathematical 
gradient operation V applied to an 
intensity image I gives the intensity 
gradient VJ at each image point. The 
intensity gradient direction shows the 
local image direction in which the max- 
imum change in intensity occurs. The 
intensity gradient magnitude gives the 


magnitude of the local rate of change 
in image intensity: 
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intensity level 


At each of the two designated points, 
the length of the vector shows the 
magnitude of the change in intensity 
and the direction of the vector shows 
the direction of greatest change. [WP: 
Image_gradient] 


intensity gradient direction: The local 
image direction in which the maxi- 
mum change in intensity occurs. See 
also intensity gradient. [WP:Gradient# 
Interpretations] 


intensity gradient magnitude: The 
magnitude of the local rate of change 
in image intensity. See also intensity 
gradient. The figure shows (left) a raw 
image and (right) its intensity gradi- 
ent magnitude (the contrast has been 
enhanced for clarity). [WP:Gradient# 
Interpretations] 


intensity histogram: A data structure 
that records the number of pixels of 
each intensity value. A typical gray 
scale image will have pixels with val- 
ues in [0,255]. Subsequently, the his- 
togram will have 256 entries recording 
the number of pixels that had value 
0, the number having value 1 etc. The 
figure shows a dark object against a 
lighter background and its histogram: 
[SB11:3.4] 


10000 


sopak 


intensity image: An image that 


records the measured intensity data. 
[TV98:2.1] 

slicing: An 
image-processing operation in which 
pixels with values other than the 
selected value (or range of values) are 
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set to zero. If the image is viewed as a 
landscape, with height proportional to 
brightness, then the slicing operator 
takes a cross section through the 
height surface. The figure shows (left) 
an image and (right) its intensity level 
80 Gn black): [Jai89:7.2] 


intensity matching: Finding corre- 


sponding points in a pair of images 
by matching the gray scale intensity 
patterns. The goal is to find image 
neighborhoods that have nearly 
identical pixel intensities. All image 
points, feature points or interest points 
could be considered for matching. An 
algorithm where intensity matching 
is used is correlation-based stereo 
matching. [Jia01] 


intensity sensor: A sensor that mea- 


sures intensity data. [BM02:1.9.1] 


inter-camera appearance variance: 


The phenomenon that the appearance 
of an object in one camera may be 
very different from its appearance in 
another camera because of differences 
in illumination, pose, camera proper- 
ties, geometry, occlusion or a host of 
other factors. [GX11:Ch. 13] 


inter-camera gap: Generally, the un- 


known distance between two imag- 
ing devices when sampling a scene. In 
satellite imagery, this has a tendency to 
make image registration more difficult 
since the motion induces changes in 
contrast and brightness: [GX11:Ch. 13] 


Camera 2 


W 


Camera 1 


interest point: A general term for pix- 
els that have some interesting prop- 
erty. Interest points are often used for 
making feature point correspondences 
between images. Subsequently, the 
points usually have some identifiable 
property. Further, because of the need 
to limit the combinatorial explosion 
that matching methods can produce, 
interest points are often expected to be 
infrequent in an image. Interest points 
are often points of high variation in 
pixel values. See also point feature. 
The figure shows interest points from 
the Harris corner detector (courtesy of 


Marc Pollefeys): [SB11:10.9] 


interest point feature detector: An 
operator applied to an image to locate 
interest points. Well-known exam- 
ples are the Moravec interest point 
operator and the Plessey corner finder. 
[SB11:10.9] 


interference: When ordinary light inter- 
acts with matter that has dimensions 
similar to the wavelength of the light or 
coherent light interacts with itself. The 
most notable effect from a computer 
vision perspective is the production 
of interference fringes and the speckle 
of laser illumination. May alternatively 
refer to electrical interference which 
can affect an image when it is being 
transmitted on an electrical medium. 
[Hec87:Ch. 9] 


interference fringe: When optical 
interference occurs, the most notice- 
able effect is the production of inter- 
ference fringes where the light illu- 
minates a surface. These are paral- 
lel, roughly equally spaced lighter 
and darker bands of brightness. One 
important consequence of these bands 


is blurring of the edge positions. 
[Hec87:9.1] 


interferometric SAR: An enhancement 


of synthetic aperture radar sens- 
ing to incorporate phase informa- 
tion from the reflected signal, increas- 
ing accuracy. [WP:Interferometric_ 
synthetic_aperture_radar] 


interior orientation: A photogramme- 


try term for the calibration of the intrin- 
sic parameters of a camera, including 
its focal length, principal point, lens 
distortion etc. This allows transforma- 
tion of measured image coordinates 
into camera coordinates. [JKS95:12.9] 


interlaced scanning: A technique 


arising from television engineering, 
whereby alternate rows (rather than 
consecutive rows) of an image are 
scanned or transmitted. As a result, one 
television frame is transmitted by send- 
ing first the odd rows, forming the odd 
field, and then the even rows, forming 
the even field. [Gal90:4.1.2] 


intermediate representation: A repre- 


sentation that is created as a stage in 
the derivation of a specific represen- 
tation from an input representation. 
For example, in Marr’s theory, the raw 
primal sketch, full primal sketch and 
2.5D sketch were intermediate repre- 
sentations between input images and 
a 3D model. In the figure, a binary 
image of the noticeboard is an interme- 
diate representation between the input 
image and the textual output. [NMPO1] 
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internal energy (or force): A measure 


of the stability (such as smoothness) of 
an active shape model or deformable 
contour model which is part of the 
deformation energy. This measure is 
used to constrain the appearance of the 
model. [WP:Internal_energy] 


internal parameter: See intrinsic 


parameter. 
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interpolation: A mathematical process 
whereby a value is inferred from other 
nearby values or from a mathemati- 
cal function linking nearby values. For 
example, dense values along a curve 
can be linearly interpolated between 
two known curve points by fitting a 
line connecting the two curve points. 
Image, surface and volume values can 
be interpolated, as well as higher 
dimensional structures. Interpolating 
functions can be curved as well as lin- 
ear. [BB82:A1.11] 


interpretation tree search: A matching 
method used between members in two 
discrete sets. For each feature from the 
first set, it builds a depth-first search 
tree considering all possible matching 
features from the second set. After a 
match is found for one feature (by sat- 
isfying a set of consistency tests), then 
it tries to match the remaining fea- 
tures. The algorithm can cope when 
no match is possible for a given feature 
by allowing a given number of skipped 
features. The figure shows a partial 
interpretation tree that matches model 
features to data features: [TV98:10.2] 
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inter-reflection: The reflection caused 
by light reflected off a surface and 
bouncing off another surface of 
the same object. See also mutual 
illumination. [WP:Diffuse_reflection# 
Interreflection] 


interval tree: An efficient structure for 
searching in which every node in the 
tree is a parent to nodes in a particular 
interval of values. [WP:Interval_tree] 
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intrinsic parameter: Parameters, such 


as focal length, coefficients of radial 
lens distortion and the position of the 
principal point, that describe the map- 
ping from image pixels to world rays 
in a camera. Determining the param- 
eters of this mapping is the task of 
camera calibration. For a pinhole cam- 
era, world rays 7 are mapped to homo- 
geneous image coordinates x by x = 
Kr where K is the upper triangular 
3 x 3 matrix 


Ouf S Ub 
K= (0) Qy Vo 


In this form, f represents the focal 
length, s is the skew angle between the 
image coordinate axes, (uo, Vo) is the 
principal point and a, and a, are the 
aspect ratios (e.g., in pixels/mm) in the 
u and v image directions. [FP03:2.2] 


intrinsic dimensionality: The number 


of dimensions (degrees of freedom) 
inherent in a data set, independent 
of the dimensionality of the space in 
which it is represented. For example, a 
curve in is intrinsically 1D although its 
points are represented in three dimen- 
sions. [WP:Intrinsic_dimension] 


intrinsic image: A term describing 


one of a set of images registered 
with the input intensity image that 
describe properties intrinsic to the 
scene, instead of properties of the 
input image. Example intrinsic images 
include: distance to scene points, 
scene surface orientations, surface 
reflectance etc. The figure shows (left) 
an intensity image and (right) a depth 
image registered with it. [BB82:1.5] 


intruder detection: An application of 


machine vision, usually analyzing a 


video sequence to detect the appear- 
ance of an unwanted person in a scene. 
[SQ04:17.5] 


invariant: Something that does not 
change under specified operations 
(e.g., a translation-invariant moment). 
[WP:Invariant_Cmathematics)] 


invariant contour function: The con- 
tour function characterizes the shape 
of a planar figure based on the exter- 
nal boundary. Values invariant to posi- 
tion, scale or orientation can be com- 
puted from the contour functions. 
These invariants can be used for recog- 
nition of instances of the planar figure. 
[BCZ93] 


invariant feature: Feature point that has 
properties that do not change despite 
differences in imaging circumstances 
such as illumination, camera geometry 
(i.e., spatial transformation) and scale. 
Used in object recognition, with a sem- 
inal example being the SIFT descriptor. 
[Low99] 


invariant Legendre moments: An 
invariant extension to Legendre 
moments. [ZSH+10b] 


invariant theory: A theoretical frame- 
work in algebra that covers invariant 
polynomials under various transforma- 
tions in a closed linear group. [DC70] 


inverse compositional algorithm: A 
fast and efficient image registration 
algorithm. Rather than updating the 
additive estimate of warp parameters 
Ap, it iteratively solves for an inverse 
incremental warp W(x; Ap)! . [BMO1] 


inverse convolution: See 
deconvolution. 


inverse Fourier transform: A transfor- 
mation that allows a signal to be recre- 
ated from its Fourier coefficients. See 
Fourier transform. [Umb98:2.5.1] 


inverse halftoning: Filtering or smooth- 
ing approaches that reconstruct a gray 
scale image from a given error diffused 
image. See also halftoning. 


inverse imaging problem: The pro- 
cess of turning observed image model 
parameters back into image data. 
For example deconvolution, inverse 


halftoning and decompression. See 
also inverse problems. 


inverse light transport: The process of 
decomposing a rendered or real image 
into a sum of bounce images, where 
each one records the contribution of 
light that bounces a known number 
of times before reaching the camera. 
[SMK04] 


inverse problem: Relating generated 
data to an underlying description of 
a problem. For example, computer 
graphics generates images from scenes 
of objects by modeling the image 
formation process. This is the regu- 
lar, non-inverse problem case. By con- 
trast, computer vision is concerned 
with the inverse problem, i.e., to 
go from the data (in this case, an 
image) to the underlying description. 
An inverse problem is often an ill-posed 
problem and may be tackled using 
regularization theory. [Pal99:1.2.3] 


inverse rendering: A collection of 
techniques for reverse-engineering 
the geometry of real objects in a 
scene from images. These techniques 
include inverse light transport and 
inverse reflectance. Related to shape 
from shading. [MTHIO3] 


inverse square law: A physical law that 
says the illumination power received 
at distance d from a point light source 
is inversely proportional to the square 
of d, i.e., it is proportional to a: [WP: 
Inverse-square_law] 


inverse tone mapping: A high dynamic 
range imaging technique that attempts 
to replicate the multiple-exposure 
approach by expanding the exposure 
range of a single image to span the 
range necessary for the creation of a 
high-dynamic range image. [BLDC06] 


invert operator: A low-level 
image-processing operation where a 
new image is formed by replacing 
each pixel by an inverted value. For 
binary images, this is 1 if the input 
pixel is 0 and O if the input pixel is 
1. For gray scale images, this depends 
on the maximum range of intensity 
values. If the range of intensity values 
is [0,255] then the inverse of a pixel 
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with value x is 256—.x. The result 
looks like a photographic negative: 
[Gal90:5.1.2] 


= BA 


IR: See infrared light. 


iris recognition: A biometric imaging 
technique that uses the physiological 
structure of striations in the human iris 
to perform recognition. It is thought to 
be less accurate than biometrics based 
on retinal imaging although it is rela- 
tively easy to use. [Dau02] 


irradiance: The amount of energy 
received at a point on a surface 
from the corresponding scene point. 
[JKS95:9.1] 


irregular octree: Like an irregular 
quadtree except the splits are in three 
dimensions. 


irregular quadtree: A quadtree decom- 
position is which each split need not 
be exactly in half. 


isomap: Isometric feature mapping. An 
algorithm for nonlinear dimensionality 
reduction where the data has manifold 
structure. Given n data points in high 
dimension, isomap returns n points 
in a lower dimensional space. It does 
this by approximating geodesic dis- 
tances on the manifold by distances 
on a neighborhood graph determined 
from the data points and then apply- 
ing classical multidimensional scaling 
to the resulting distance matrix. 
[BisOG6:12.4.3] 


isometry: A transformation that pre- 
serves distances. The transformation 
T :x> u is an isometry if, for all 
pairs (x,y), |x- yl = TŒ — TOW). 
[HZ00:1.4.1] 

isophote curvature: An isophote is a 
curve of constant image intensity. The 
curvature is defined at any given pixel 
as — oe , where L,, is magnitude of the 
gradient perpendicular to the isophote 
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and L,, is the curvature of the inten- 
sity surface along the isophote at that 
point. [VG08] 


iso-surface: A surface in a 3D space 
where the value of some function is 
constant, i.e., f(x, y, D = C where C 
is a constant. [WP:Isosurface] 


isotropic gradient operator: A 
gradient operator that computes the 
scalar magnitude of the gradient, i.e., 
a value that is independent of edge 
direction. [JKS95:5.1] 


isotropic operator: An operator that 
produces the same output irrespec- 
tive of the local orientation of the 
pixel neighborhood where the oper- 
ator is applied. For example, a mean 
smoothing operator produces the 
same output value, even if the image 
data is rotated at the point where the 
operator is being applied. On the other 
hand, a directional derivative opera- 
tor would produce different values 
if the image were rotated. This con- 
cept is particularly relevant to feature 
detection: some detectors are sensi- 
tive to the local orientation of the 
image pixel values and some are not 
Gsotropic). [Gal90:6.4.1] 


isotropic scaling: A uniform transforma- 
tion that multiplies all values in a vec- 
tor by the same scaling factor x = ax 
where a is the isotropic scaling factor. 


iterated closest point: See iterative 
closest point. 


iterative closest point (ICP): A shape 
alignment algorithm that works by iter- 
ating its two-stage process until some 
termination point: 

1. Given an estimated transformation 
of the first shape onto the second, 
find the closest feature from the sec- 
ond shape for each feature of the 
first shape. 

2. Given the new set of closest fea- 
tures, re-estimate the transforma- 
tion that maps the first feature set 
onto the second. 

Most variations of the algorithm need a 

good initial estimate of the alignment. 

[FP03:21.3.2] 


TUE: See 
Environment. 


Image Understanding 


Jacobian: The matrix of derivatives of 
a vector function. Typically if the func- 
tion fŒ) is written in component form 
as: 


> => 


SO) = JO p) 


fii, X2,- Xp) 

Sa, XM, ++, Xp) 

Sai, %2, +++ Xp) 
then the Jacobian J is the n x p matrix 

oft aft 

Oxy ee ðxp 

J=] : : 

ax hi ðxXp 

[SQ04:2.2.1] 


joint entropy registration: Registra- 
tion of data using joint entropy (a mea- 
sure of the degree of uncertainty) as 
a criterion. [WP:Mutual_information# 
Applications_of_mutual_information] 


joint invariant: An invariant function 
J :X,x...x Xm — R fora Cartesian 
product of a group G such that 
JEX,- SX) = JM, -n Xm) 
for Vg € G, x; € X; [OT99:76-77] 


joint probability distribution: Con- 
sider a vector-valued random variable 
X = (%1, ..., X4). For discrete random 
variables, the joint probability distri- 
bution p(X) is the probability of the 
variables jointly taking on a certain 
configuration. For continuous random 


variables there is a corresponding 
probability density function pœ), 
such as the multi-variate normal distri- 
bution. See also marginal distribution 
and conditional probability. 
[BisOG6:1.2] 


JPEG: A common format for compressed 
image representation designed by the 
Joint Photographic Experts Group 
JPEG). [Umb98:1.8] 


JPEG 2000: An image compression tech- 
nique and algorithm based on JPEG 
where the DCT is replaced with two 
different wavelet decompositions, the 
“CDF 9/7” and “CDF 5/3” wavelet 
descriptors. The JPEG 2000 standard 
provides for both lossless compression 
and lossy compression. [ARO5:Ch. 17] 


junction label: A symbolic label for the 
pattern of edges meeting at a junc- 
tion. This approach is mainly used in 
blocks world scenes where all objects 
are polyhedra, and thus all lines are 
straight and meet at only a limited 
number of configurations. See also line 
label. The figure shows a “Y” junction 
G.e., the corner of a block seen front 
on) and and “arrow” junction (i.e., the 
corner of a block seen from the side): 
[Nal93:4.1.1] 
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K2 algorithm: A heuristic algorithm that 
searches for the most probable belief- 
network structure given a database of 
cases by comparing various probabil- 
ity ratios and ranking the structures by 
their posterior probabilities. [CH91] 


k-means: An iterative squared error 
clustering algorithm. Input is a set of 
points {x;}_, and an initial guess at 
the locations ¢,,...,¢C, of k cluster 
centers. The algorithm alternates two 
steps: points are assigned to the clus- 
ter center closest to them, and then 
the cluster centers are recomputed as 
the mean of the associated points. Iter- 
ating yields an estimate of the k clus- 
ter centers that is likely to minimize 
$- ming |X — C|*. [FP03:14.4.2] 


k-means clustering: See k-means. 


k-medians (also k-medoids): A 
variant of k-means clustering in 
which multidimensional medians 
are computed instead of means. 
The definition of multidimensional 
median varies, but options for the 


median m of a set of points {X}, 


ie, {Ot,...,°)}1,, include the 
componentwise definition m= 
(median{x; }/_,, ..., median{ xz}; 1) and 


the analog of the one-dimensional def- 
inition m= argmin pera X |m— X]. 
[WP:K-medians_clustering] 


k-nearest-neighbor algorithm: A 
nearest neighbor algorithm that uses 
the classifications of the nearest k 
neighbors when making a decision. 
[FP03:22.1.4] 


Kalman filter: A recursive linear esti- 
mator of a varying state vector and 
associated covariance from observa- 
tions, their associated covariances and 
a dynamic model of the state evolution. 


Improved estimates are calculated as 
new data is obtained. [FP03:17.3] 


Karhunen—Loéve transformation: 


The projection of a vector (or an 
image treated as a vector) onto an 
orthogonal space that has uncorrelated 
components constructed from the 
autocorrelation (scatter) matrix of a 
set of example vectors. An advantage 
is that the orthogonal components 
have a natural ordering (by the largest 
eigenvalues of the covariance of the 
original vector space) so that one can 
select the most significant variation 
in the dataset. The transformation 
can be used as a basis for image com- 
pression, for estimating linear models 
in high-dimensional datasets and for 
estimating the dominant modes of vari- 
ation in a dataset etc. It is also known 
as the “principal component trans- 
formation” (see principal component 
analysis (PCA)). The following image 
shows a dataset before and after 
the KL transform was applied: 
[Jai89:5.11] 
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Kendall’s shape space: A description 


system for semi-invariant geometric 
shapes where sets of point coordi- 
nates (configurations) are centered on 
the origin and scaled such that size is 
then defined as the sum of squared 
Euclidean distance from the point to 
the configuration centroid. [Ken89] 
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kernel: 1) A small matrix of numbers 


that is used by the image convolution 
operator. 

2) The structuring element used in 
mathematical morphology operations. 
See also kernel function. 


kernel canonical correlation anal- 


ysis: An extension of canonical 
correlation analysis (CCA) which is 
equivalent to carrying out CCA in a 
high-dimensional feature space. The 
kernel trick means that the computa- 
tions can be done on an n x n matrix 
Gf there are n data points) rather than 
in the feature space. [SC04:6.5] 


Kernel Fisher discriminant analy- 


sis: A classification method using 
the kernel trick to lift Fisher linear 


the setting of the relative weights of 
the different kernels. Kernel learning 
refers to the general process of setting 
these parameters. [Mur12:12.5] 


kernel principal component analy- 


sis (KPCA): An extension of principal 
component analysis (PCA) that is 
equivalent to a nonlinear mapping of 
the data into a high-dimensional fea- 
ture space in which the global axes of 
maximum variation are extracted. The 
kernel trick means that the computa- 
tions can be done on an n x n matrix 
(for n data points) rather than in the 
feature space. KPCA can be used for 
feature extraction. [SSO2:Ch. 14] 


kernel regression: A non-parametric 


method for formulating regression 


discriminant analysis into feature 
space. This has the properties that: 
curved classification boundaries can 
be produced; the classification bound- 
aries are defined locally by the classes 
rather than globally; and a high- 
dimensional feature space is avoided 
by using the kernel trick. [SC04:5.4] 


kernel function: 1) A function in an 


integral transformation (e.g., the expo- 
nential term in the Fourier transform). 
2) A function applied at every point in 
an image (see convolution operator). 
3) A function R(x, x’) mapping two 
inputs x, x’ to R so that the resulting 
Gram matrix is positive definite for any 
set inputs. The function in some sense 
quantifies the similarity of inputs x and 
x’. The kernel function can also be 
thought of as a dot-product in a feature 
space defined by a nonlinear mapping 
of x to d(x). It is used, e.g., in kernel 
Fisher discriminant analysis and the 
support vector machine. An example 
kernel is the radial basis function ker- 
nel: R(x, x’) = exp(— sa || «x — x’ ||). 
[SS02:p. 30] 


Kernel learning: For methods such 


as the support vector machine 


problems. Given a dataset of input- 
output pairs (%;, y;) for i = 1,..., n, 
the kernel regression estimate of the 
value of the function at input point x is 
given by J -1 Vik, X)/ Dif ROX, X), 
where KC, -) denotes a kernel function. 
[Bis06:6.3] 


kernel ridge regression: A generaliza- 


tion of linear regression which is equiv- 
alent to carrying out ridge regression 
in a high-dimensional feature space. 
The kernel trick means that the com- 
putations can be done on an nxn 
matrix (if there are n data points) rather 
than in the feature space. The result- 
ing predictor has the form fŒ@ = 
yy, R(X, X), where kG, -) denotes 
a kernel function, and the coefficients 
{a;} are determined by solving a lin- 
ear system. See also Gaussian process 
regression. [SC04:2.2] 


kernel trick: If an algorithm depends on 


the input data vector solely in terms of 
inner products between pairs of vec- 
tors, then it can be lifted into feature 
space by replacing occurrences of the 
inner products by the kernel function 
(definition 3). This is used, e.g., in 
kernel principal component analysis 


and Gaussian process regression 
which depend on a kernel function 
(definition 3), there may be some free 
parameters in the kernel which need to 
be set. Similarly, methods for making 
linear combinations of kernels (called 
multiple kernel learning) also require 


and kernel Fisher discriminant analysis. 
[SSO2:p. 34] 


key frame: Primarily a computer graph- 


ics animation technique, where key 
frames in a sequence are drawn 
by more experienced animators and 
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intermediate interpolating frames are 
drawn by less experienced animators. 
In computer vision motion sequence 
analysis, key frames are the analo- 
gous video frames, typically displaying 
motion discontinuities between which 
the scene motion can be smoothly 
interpolated. [WP:Key_frame] 


KHOROS: An image processing devel- 
opment environment with a large set 
of operators. The system comes with 
a pull-down interactive development 
workspace where operators can be 
instantiated and connected by click- 
and-drag operations. [KR94] 


kinematic motion models: Mathemat- 
ical descriptions of the mechanical 
movement of objects or collections 
of objects. Sometimes derived from 
motion capture or articulated object 
tracking. [Moe11:Ch. 10] 


kinetic depth: A technique for esti- 
mating the depth at image feature 
points (usually edges) by exploiting a 
controlled sensor motion. This tech- 
nique generally does not work at all 
points of the image because of insuf- 
ficient image structure or sensor pre- 
cision in smoothly varying regions, 
such as walls. See also shape from 
motion. A typical motion case is for 
the camera to rotate on a circular 
trajectory while fixating on a point 
in front of the camera: [WP:Depth_ 
perception#Monocular_cues] 
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Kirsch compass edge detector: A first 


the gradient in different directions 
according to which calculation mask is 
used. Edges have high gradient values, 
so thresholding the intensity gradient 
magnitude is one approach to edge 
detection. The following Kirsch mask 
detects edges at 45°: [Umb98:2.3.4] 
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knowledge-based vision: A style of 


image interpretation that relies on mul- 
tiple processing components capable 
of different image analysis processes, 
some of which may solve the same task 
in different ways. The components are 
linked by a reasoning algorithm that 
knows about the capabilities of the dif- 
ferent components, when they might 
be usable or might fail. An additional 
common component is some form 
of task-dependent knowledge encoded 
in a knowledge representation that 
is used to help guide the reason- 
ing algorithm. Also common is some 
uncertainty representation mechanism 
that records the confidence that the 
system has about the outcomes of its 
processing. For example, a knowledge- 
based vision system might be used for 
aerial analysis of road networks, con- 
taining specialized detection modules 
for straight roads, road junctions and 
forest roads as well as survey maps, ter- 
rain type classifiers, curve linking etc. 
[Nev82:10.2] 


knowledge representation: A gen- 


eral term for methods of encoding 
knowledge in computers. In computer 
vision systems, this is usually knowl- 
edge about recognizable objects and 
visual-processing methods. A common 
knowledge representation scheme is 
the geometric model that records the 
2D or 3D shapes of objects. Other com- 
monly used vision knowledge repre- 
sentation schemes are graph models 
and frames. [BT88:Ch. 9] 


Koenderink’s surface shape clas- 


sification: An alternative to the 
more common mean curvature and 
Gaussian curvature 3D surface shape 


derivative edge detector that computes 
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classification labels. Koenderink’s 


scheme decouples the two intrinsic 
shape parameters into one parameter 
(S) that represents the local surface 
shape (including cylindrical, hyper- 
bolic, spherical and planar) and a 
second parameter (C) that encodes 
the magnitude of the curvedness of the 
shape. The figure illustrates the shape 
classes represented in Koenderink’s 
classification scheme: [KvD79] 


Kohonen network: A multi-variate data 


clustering and analysis method that 
produces a topological organization of 
the input data. The response of the 
whole network to a given data vector 
can be used as a lower-dimensional 
signature of the data vector. 
[WP:Counterpropagation_network] 


Krawtchouk moments: Discrete ortho- 


gonal moments based on Krawtchouk 
polynomials K, : p, N) = 
DA o Akn pX*. [YPO03] 


KIC noise: A type of noise associ- 


ated with field effect transistor (FET) 
image sensors. It is called “kTC” noise 
because the noise is proportional to 


VkRTC where T is the temperature, 
C is the capacitance of the image 
sensor and k is Boltzmann’s constant. 
This noise arises during image capture 
at each pixel independently and 
is also independent of integration 
time. [WP:Johnson-Nyquist_noise# 
Thermal_noise_on_capacitors] 


Kullback-Leibler divergence: A mea- 


sure of the relative entropy or 
divergence between two _probability 
densities p,(x) and p.(x), defined 


as D(pi||P2) = f D®log Ê oe Pa 
p2 


Note that it is not symmetric between 
the two arguments, and thus is not a 
distance. [CT91:Ch. 2] 


kurtosis: A measure of the flatness 


of a probability distribution. If m = 
E((X— wf] is the fourth central 
moment of the distribution of X rel- 
ative to the mean u and o? is its 
variance, then kurtosis = m‘/o4 — 3. 
The —3 factor is chosen so that the 
Gaussian distribution has zero kurtosis. 
[PTVF92:14.1] 


Kuwahara: An edge-preserving noise 


reduction filter. The filter uses four 
regions surrounding the pixel being 
smoothed. The smoothed value for that 
pixel is the mean value of the region 
with smallest variance. [BVV99] 
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Lı norm: For a d-dimensional vector 
xxl = DE |x|. [GL89:p. 53] 


L, norm: For a d-dimensional vector X, 
lz = Coie x". [GL89:p. 53] 


Lx image coding: Image coding 
or compression techniques which 
directly address minimizing the Zæ 
norm (maximum deviation norm) for 
reconstruction error, rather than the 
more common L, norm as used in 
JPEG. 


label: A description associated with 
something for the purposes of 
identification (see region labeling). 
[BB82:12.4] 


labeling problem: Given a set S of 
image structures (which may be pix- 
els as well as more structured objects 
such as edges) and a set of labels Z, 
the labeling problem is the question of 
how to assign a label Z € L for each 
image structure s € S$. The process is 
usually dependent both on the image 
data and on neighboring labels. A typ- 
ical remote-sensing application is to 
label image pixels by their land type, 
such as water, snow, sand, wheat field, 
forest etc. [BB82:12.4] In the figure, a 
range image (left) has its pixels labeled 
by the sign of their mean curvature 
(right: white is negative; light gray is 
zero; dark gray is positive; black is miss- 
ing data): 


lacunarity: A scale-dependent measure 
of translational invariance based on the 
size distribution of holes within a set. 
High lacunarity indicates that the set is 
heterogeneous and low lacunarity indi- 
cates homogeneity. [PS06:3.3] 


LADAR: LAser Detection And Ranging or 
Light Amplification for Detection and 
Ranging. See laser radar. [BB82:2.3.2] 


Lagrange multiplier technique: A 
method of constrained optimization to 
find a solution to a numerical problem 
that includes one or more constraints. 
The classical form of the Lagrange 
multiplier technique finds the param- 
eter vector v minimizing (or maximiz- 
ing) the function f@) = g) + ub), 
where g() is the function being mini- 
mized and b) is a constraint function 
that has value zero when its argument 
satisfies the constraint. The Lagrange 
multiplier is u. [Hor86:A.5] 


Laguerre formula: A formula for com- 
puting the directed angle between two 
3D lines based on the cross ratio of four 
points. Two points arise where the two 
image lines intersect the ideal line (.e., 
the line through the vanishing points) 
and the other two points are the ideal 
line’s absolute points (intersection of 
the ideal line and the absolute conic). 
[Fau93:2.4.8] 


Lambert’s law: The observed shading on 
ideal diffuse reflectors is independent 
of observer position and varies with 
the angle 0 between the surface normal 
and the source direction: [JKS95:9.1.2] 
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Lambertian surface: A surface whose 


reflectance obeys Lambert’s law, more 
commonly known as a matte surface. 
These surfaces have equally bright 
appearance from all viewpoints. The 
shading of the surface thus depends 
only on the relative direction of the 
incident light. [FP03:4.3.3] 


landmark detection: A general term for 


detecting an image feature that is com- 
monly used for registration. The regis- 
tration might be between a model and 
the image or it might be between two 
images. A landmark might be task spe- 
cific, such as a component on an elec- 
tronic circuit card or an anatomical fea- 
ture (such as the tip of the nose), or it 
might be a more general image feature 
such as an interest point. [SB11:9.1] 


LANDSAT: A series of satellites launched 


by the United States of America that are 
a common source of satellite images of 
the earth. LANDSAT 7 was launched 
in April 1999 and provides complete 
coverage of the earth every 16 days. 
[BB82:2.3.1] 


Laplace—Beltrami operator: An opera- 


tor f from differential geometry which 
describes the divergence of the gradi- 
ent: A f = div grad f. [Cha84] 


Laplacian: Loosely, the Laplacian of a 


function is the sum of its second- 
order partial derivatives. For exam- 
ple the Laplacian of f(x, y, ok ee 

R is Vf@y.D= eg +E L4 2g, 
In computer vision, the pcan 
operator may be applied to an image, 
by convolution with the Laplacian ker- 
nel, one definition of which is given by 
the sum of second derivative kernels 
[—1,2,—-1] and [—1,2,—1]', with 
zero padding to make the result 3 x 3: 
JKS95:5.3.1] 
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Laplacian eigenspace: A feature rep- 


resentation space where the eigen- 
vectors corresponding to the smallest 
nonzero eigenvalues of the Laplacian 
matrix are used. [SC08] 


L,;,Q= 0 v Adj 


Laplacian matrix: Matrix representa- 


tion L of a graph G where: 


1 if i = j and dj 40 
if i & j adjacent 


otherwise 


where d is the vertex degree. [BCE00] 


Laplacian of Gaussian operator: A 


low-level image operator that applies 


the second-derivative Laplacian 
operator (V°) after a Gaussian 
smoothing operation everywhere 


in an image. An isotropic operator, 
often used as part of a zero-crossing 
operator for edge detection because 
the locations where the value changes 
sign (positive to negative or vice versa) 
of the output image are located near 
the edges in the input image, and 
the detail of the detected edges can 
be controlled by use of the Gaussian 
smoothing scale parameter. The 
figure shows a mask that implements 
the Laplacian of Gaussian operator 
with smoothing parameter o = 1.4: 
[JKS95:5.4] 


efits tefatelitife 
BOBS Soo8 


[2 | | 0 fola o| 5 | 2 | 
[2 [5] 3 |-2|-2[-] a [5 | 2 
BBE BBE GEE 


Laplacian pyramid: A compressed 


representation in which a pyramid 
of Laplacian images is created. At 
each level of the scheme, the cur- 
rent gray scale image has the Lapla- 
cian applied to it. The next level gray 
scale image is formed by Gaussian 
smoothing and subsampling. At the 
final level, the smoothed and sub- 
sampled image is kept. The original 
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image can be approximately recon- 
structed level by level through expand- 
ing and smoothing the current level 
image and then adding the Laplacian. 
[FP03:9.2.1] 


Laplacian smoothing: A mesh- 
smoothing algorithm where the 
current vertex is replaced by the mean 
value of the neighboring vertices. 
[HDZ05:8.1.3.2] 


large field: 2D capture angles in a wide 
field of view image-capture system. 
[BSLBO1:Ch. 5] 


laser: Light Amplification by Stimu- 
lated Emission of Radiation. A very 
bright light source often used for 
machine-vision applications because of 
its properties: most light is at a single 
spectral frequency; the light is coher- 
ent, so various interference effects can 
be exploited; and the light beam can 
be processed so that divergence is 
slight. Two common applications are 
for structured light triangulation and 
range sensing. [Hec87:14.2] 


laser illumination: A very bright light 
source useful because of its limited 
spectrum, bright power and coher- 
ence. See also laser. [Hec87:14.2] 


laser radar (LADAR): A LIDAR range 
sensor that uses laser light. See also 
laser range sensor. [BB82:2.3.2] 


laser range sensor: A range sensor that 
records the distance from the sensor to 
a target or target scene by detecting the 
image ofa laser spot or stripe projected 
onto the scene. These sensors are 
commonly based on structured light 
triangulation, time of flight or phase 
difference technologies. [TV98:2.5.3] 


laser speckle: A time-varying light pat- 
tern produced by interference of the 
reflection of light from a surface illumi- 
nated by a laser. [Hec87:14.2.2] 


laser stripe triangulation: A structured 
light triangulation system that uses 
laser light. For example, a projected 
plane of light that would normally 
result in a straight line in the cam- 
era image is distorted by an object in 
the scene and the distortion is pro- 
portional to the height of the object. 
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The figure shows a typical triangula- 
tion geometry: [JKS95:11.4.1] 
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latent behavior: Unquantified or hidden 
geometric, statistical or relational prop- 
erties exhibited by a system or entity. 
Generally referring to emerging or hid- 
den behavior that is not readily mea- 
surable or quantifiable. See also latent 
variable model. 


latent Dirichlet allocation (LDA): A 
latent variable model where each 
observation vector is generated from 
a combination of underlying sources 
or topics. This is achieved by draw- 
ing a latent probability vector from a 
Dirichlet distribution and generating 
observations from the combination of 
sources defined by these probabilities. 
In contrast, a mixture model forces 
each observation vector to come from 
only one of the sources. LDA is used 
for identifying topics in documents. 
See also topic model and probabilistic 
latent semantic analysis. [Mur12:24.3] 


latent structure: Unquantified or hidden 
geometric, statistical or relational prop- 
erties exhibited by a system. 


latent variable model: A model of 
observed variables X in terms of 
some latent (or hidden) common 
“causes”. A single discrete latent vari- 
able yields a mixture model. Mod- 
els that use a continuous vector of 
latent variables include factor analysis, 
independent component analysis and 


latent Dirichlet allocation (LDA). 
[BisOG:Chs 9, 12] 


lateral inhibition: A process in which 


a given feature weakens or eliminates 
nearby features. For example, in the 
Canny edge detector locally maximal 
intensity gradient magnitudes cause 
adjacent gradient values that lie across 
(rather than along) the edge to be set 
to zero. [Nev82:6.2] 


lattice: A repeating set of points that may 
be used to describe an object if at least 
one of the dimensions is related to cap- 
tured data. See also mesh model and 
surface mesh: 


Laws’ texture energy measure: A mea- 


sure of the amount of image inten- 
sity variation at a pixel. The measure 
is based on five 1D finite difference 
masks convolved orthogonally to give 
25 2D masks. The 25 masks are then 
convolved with the image. The outputs 
are smoothed nonlinearly and com- 
bined to give 14 contrast- and rotation- 
invariant measures. [PS06:4.6] 


layered motion model: A statistical 


model that attempts to capture multi- 
ple objects and surfaces in full motion 
video by modeling each pixel as an 
additive mixture of Gaussian densi- 
ties, each associated with an object. 
[AS94] 


layered representation: A probabilistic 


model that operates at different levels 


of granularity and uses a multilayer hid- 
den Markov model. [OHGO2] 


learning: 1) The acquisition of knowl- 


edge or skills. For a more specific def- 
inition in our context see machine 
learning. See also supervised learning 
and unsupervised learning. 

2) The processes of parameter 
estimation and model selection. 
[BisO6:pp. 1-4] 


learning from observation: See 


learning. 


least mean square estimation: Also 


known as “least square estimation” 
or “mean square estimation”. Let Ù 
be the parameter vector that we are 
searching for and e;(v) be the error 
meaasure associated with the 7th of N 
data items. The error measure often 
used is the Euclidean distance, the 
algebraic distance or the Mahalanobis 
distance between the 7th data item 
and a curve or surface being fitted, 
that is parameterized by v. Then the 
mean square error is: 


1 N 
— J ey 
N j=l 


The desired parameter vector Ù mini- 
mizes this sum. [WP:Least_squares] 


least median of squares estimation: 


Let v be the parameter vector that 
we are searching for and e;(v) be the 
error associated with the 7th of N data 
items. The error measure often used 
is Euclidean distance, the algebraic 
distance or the Mahalanobis distance 
between the 7th data item and a curve 
or surface being fit that is parameter- 
ized by ù. Then the median square 
error is the middle value of the sorted 
set {e,(v)*}. The desired parameter vec- 
tor ù minimizes this median value. 
This estimator usually requires more 
computation for iterative and sorting 
algorithms but can be more robust 
to outliers than least mean square 
estimation. [JKS95:13.6.3] 


least square curve fitting: A least mean 


square estimation process that fits a 
parametric model of a curve or a line 
to a collection of data points, usually 
2D or 3D: 
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Fitting often uses 
distance, the algebraic distance or the 


the Euclidean 


Mahalanobis distance to evaluate the 
goodness of fit. [FP03:15.2-15.3] 


least square estimation: See least mean 
square estimation. [WP:Least_squares] 


least square surface fitting: A least 
mean square estimation process that 
fits a parametric model of a surface to a 
collection of data points, usually range 
data. Fitting often uses the Euclidean 
distance, the algebraic distance or the 
Mahalanobis distance to evaluate the 
goodness of fit. The range image Cleft) 
has planar and cylindrical surfaces fit- 
ted to the data (right): [JKS95:3.5] 


least squares fitting: A general term for 
a least mean square estimation process 
that fits some parametric shape, such 
as a curve or surface, to a collection of 
data. Fitting often uses the Euclidean 
distance, the algebraic distance or the 
Mahalanobis distance to evaluate the 
goodness of fit. [BB82:A1.9] 


leave-one-out test: A method for test- 
ing a solution in which one sample is 
left out of the training set and used 
instead for testing. This can be done for 
every sample. See also cross-validation. 
[FP03:22.1.5] 


LED: Light Emitting semiconductor 
Diode. Often used as detectable point 
light source markers or controllable 
illumination. [Gal90:7.1] 
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left-handed coordinate system: A 3D 
coordinate system with the XYZ axes 
arranged as in the figure: 


+Y 
+Z (INTO PAGE) 


The alternative is a right-handed 
coordinate system. [WP:Cartesian_ 
coordinate_system#Orientation_and_ 
handedness] 


Legendre moment: The Legendre 
moment of a piecewise continu- 
ous function f(x,y) with order 
(m,n) is 1Qm+4+DQnt) fi fr 
PnP, DO fx, Wdxdy where P,,(x) 
is the mth order Legendre polynomial. 
These moments can be used for char- 
acterizing image data and images can 
be reconstructed from the infinite set 
of moments. [TC88] 


Lempel—Ziv—Welch (LZW): A form of 
file compression based on encoding 
commonly occurring byte sequences. 
This form of compression is used in 
the common GIF image file format. 
[Umb98:5.2.3] 


lens: A physical optical device for focus- 
ing incident light onto an imaging sur- 
face, such as photographic film or an 
electronic sensor. Lenses can also be 
used to change magnification and to 
enhance or modify a field of view. 
(Hor86:2.3] 


lens distortion: Unexpected variation 
in the light field passing through a lens. 
Examples are radial lens distortion or 
chromatic aberration and usually arise 
from how the lens differs from the ideal 
lens. [JKS95:12.9] 


lens equation: The simplest case of 
a convex converging lens with focal 
length f perfectly focused on a tar- 
get at distance D has distance d 
between the lens and the image plane 
as related by the lens equation 4+ = 


f 
Ł + 4: 1KS95:8.1] 
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lens flare/glare: An optical effect, usu- 
ally unwanted, caused by stray light 
reflecting or refracting through an 
imaging system. It is generally caused 
by a relatively high intensity light 
source outwith the immediate imaging 
environment. [DeL01:5.5] 


lens model: Mathematical treatment of 
the movement of light through the 
interfaces between substances of dif- 
ferent refractive indices (commonly 
referred to as a lens). See also camera 
model and lens equation. 


lens type: A general term for lens 
shapes and functions, such as convex 
or half-cylindrical, converging, magni- 
fying etc. [Hor86:2.3] 


level set: The set of data points x that 
satisfy a given equation of the form: 
f@ = c. Varying the value of c gives 
different sets of usually closely related 
points. A visual analogy is of a geo- 
graphic surface and the ocean rising. If 
the function fO is the sea level, then 
the level sets are the shore lines for dif- 
ferent sea levels c. The figure shows an 
intensity image and the pixels at level 
(brightness) 80: [SQ04:8.6.1] 


level set tree: A tree structure which is 
composed of the separate parts of the 
level sets of a function. They are useful 
to visualize estimates of mixed multi- 
variate density functions, both their 
modes and their shape characteristics. 
[Kle04] 


Levenberg—Marquardt optimization: 
A numerical multi-variate optimization 
method that switches smoothly 
between gradient descent when far 
from a docal) optimum and a second- 
order inverse Hessian (quadratic) 
method when nearer. [FP03:3.1.2] 


Levenshtein edit distance: An informa- 
tion theoretical metric for determining 
the difference between two strings of 
character sequences. It is computed as 
the minimum number of edits required 
to turn one string into the other. Valid 
edit functions are deletion, insertion 
and substitution. [Chr12:Ch. 5] 


license plate recognition: A computer 
vision application that aims to identify 
a vehicle’s license plate from image 
data. Image data is often acquired from 
automatic cameras at places where 
vehicles slow down, such as bridges 
and toll barriers. [WP:Automatic_ 
number_plate_recognition] 


LIDAR: Light Detection And Ranging. 
A range sensor using (usually) laser 
light. It can be based on the time of 
flight of a pulse of laser light or the 
phase shift of a waveform. The mea- 
surement could be of a single point 
or an array of measurements if the 
light beam is swept across the scene 
or object. [BB82:2.3.2] 


Lie group: A group that can be rep- 
resented as a continuous and differ- 
entiable manifold of a space such 
that group operations are also con- 
tinuous. An example of a Lie group 
is the orthogonal group SOG) = {R € 
R?” : RTR = I, det) = 1} of rigid 
3D rotations. [WP:Lie_group] 


light: A general term for the electro- 
magnetic radiation used in many 
computer-vision applications. The 
term could refer to the illumination in 
the scene or to the irradiance coming 
from the scene onto the sensor. 
Most computer vision applications 
use visible light, infrared light or 
ultraviolet light. [Jai89:3.2] 


light field: An entire light environment 
consisting of light travelling in every 
direction at every point in space. 
[Sze10:13.3] 
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light source: A general term for the 
source of illumination in a scene, 
whether deliberate or accidental. The 
light source might be a point light 
source or an extended light source. 
[FP03:5.2] 


light source detection: The process of 
detecting the position of or direction 
to the light sources in the scene, even 
if not observable. The light sources 
are usually assumed to be point light 
sources for this process. [BB04] 


light source geometry: A general term 
referring to the light source placement 
and shape in a scene. [LR85] 


light source placement: A general term 
for the positions of the light sources 
in a scene. It may also refer to the 
care that machine-vision applications 
engineers take when placing the light 
sources so as to minimize unwanted 
lighting effects (such as shadows and 
specular reflections) and to enhance 
the visibility of desired scene struc- 
tures, e.g., by back lighting or oblique 
illumination. [Gum02] 


light stripe ranging: See structured 
light triangulation. 


light transport: In image-rendering 
techniques, the mathematical mod- 
els of energy transfer between inter- 
faces which alter whether surfaces of 
objects are visible. [Sze10:2.2.2] 


lightfield: A function that encodes the 
radiance on an empty point in space 
as a function of the point’s position 
and the direction of the illumination. A 
lightfield allows image-based rendering 
of new Cunoccluded) scene views from 
arbitrary positions within the lightfield. 
[WP:Light_field] 


lighting: A general term for the 
illumination in a scene, whether delib- 
erate or accidental. [Gal90:2.1.1] 


lighting capture: A method for record- 
ing the incident light, from all direc- 
tions, at a point in space. Performed as 
part of high dynamic range imaging for 
imagery relighting. [RHD+10:9.3] 


lightness: The estimated or perceived 
reflectance of a surface, when viewed 
in monochrome. [Nev82:6.1-6.2] 


154 


lightpen: A user-interface device that 
allows people to indicate places on 
a computer screen by touching the 
screen at the desired place with the 
pen. The computer can then draw 
items, select actions etc. It is effec- 
tively a type of mouse that acts on 
the display screen instead of on a mat. 
[WP:Light_pen] 


likelihood function: In a statistical 
model M the connection between 
the parameters 6 and the data D 
is through the likelihood function 
L(@|M) = p(D|0, M). Notice that the 
likelihood is viewed as a function of 
the parameters 0 and M with D fixed. 
[BisOG6:1.2.3] 


likelihood ratio: 1) The ratio of the 
likelihood functions of two different 
models for the same data. For exam- 
ple, in a Bayesian classifier for a binary 
classification problem, the ratio of pos- 
terior probabilities for the two classes 
would depend on the product of the 
likelihood ratio and the ratio of prior 
class probabilities. 
2) In statistics, the likelihood ratio 
test is used to compare two statistical 
models for the same data based on 
the ratio of their likelihood scores. 
[Was04:10.6] 


likelihood score: For a statistical 
model M with data D, the max- 
imum value of the log likelihood 
function Z(@|M) with respect to 9, i.e., 
max, log L(0|M). [KFO9:18.3.1] 


limb extraction: A process of image 
interpretation that extracts the arms 
or legs of people or animals (e.g., for 
tracking) or extracts the barely visible 
edge of a curved surface as it curves 
away from an observer (derived from 
an astronomical term): 


LIMB 


See also occluding contour. [TYOOO] 


limited angle tomography: In com- 
puted tomography (medical CT) it is 
rarely possible to collect data over the 
full angular range. In such cases data 
reconstruction from available informa- 
tion must be used, which can be 
achieved via a range of techniques. 
[DB98] 


line: Usually refers to a straight ideal line 
that passes through two points, but 
may also refer to a general curve mark- 
ing, e.g., on paper. [Sch89:App. 1] 


line cotermination: When two lines 
have endpoints in exactly or nearly the 
same location: [Bie85] 


LINE 
COTERMINATIONS 


line detection operator: A feature 
detection process that detects lines. 
Depending on the specific opera- 
tor, locally linear line segments may 
be detected or straight lines might 
be globally detected. Note that this 
detects lines as contrasted with edges. 
[Nev82:7.3] 


line-drawing analysis: 1) Analysis of 
hand-made or CAD drawings to extract 
a symbolic or shape description. For 
example, research has investigated 
extracting 3D building models from 
CAD drawings. Another application 
is the analysis of hand-drawn circuit 
sketches to form a circuit description. 
2) Analysis of the line junctions in 
a polyhedral blocks world scene, in 
order to understand the 3D structure 
of the scene. [Nal93:Ch. 4] 


line fitting: A curve-fitting problem 
where the objective is to estimate 
the parameters of a straight line that 
best interpolates given point data. 
[DH73:9.2] 


line following: See line grouping. 


line grouping: Generally refers to the 
process of creating a longer curve 


by grouping together shorter frag- 
ments found by a line detection 
operator. These might be short con- 
necting locally detected line fragments 
or longer straight line segments sep- 
arated by a gap. May also refer to 
the grouping of line segments on the 
basis of principles such as parallelism. 
See also edge tracking and perceptual 
organization and Gestalt. 


line intersection: Where two or more 
lines intersect at a point. The lines 
cross or meet at a line junction: 
[Hor86:15.6] 


NV OYA 


LINE INTERSECTIONS 


line junction: The point at which two or 
more lines meet. See junction labeling. 
[Nal93:4.1.1] 


line label: In an ideal polyhedral blocks 
world scene, lines arise from only a 
limited set of physical situations such 
as convex or concave surface shape 
discontinuities (fold edges); occluding 
contours where a fold edge is seen 
against the background (blade edge); 
crack edges where two polyhedra have 
aligned edges; or shadow edges. Line 
labels identify the type of line as one 
of these types. Assigning labels is one 
step in scene understanding that helps 
deduce the 3D structure of the scene. 
See also junction label. The figure 
shows the usual line labels for convex 
(+), concave (—) and occluding (>) 
edges: [Hor86:15.6] 


: 


line linking: See line grouping. 


line matching: The process of making a 
correspondence between the lines in 
two sets. One set might be a geometric 
model such as used in model-based 
recognition, model registration or 
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alignment. Alternatively, the lines may 
have been extracted from different 
images, as when doing feature-based 
stereo or estimating the epipolar 
geometry between the two _ lines. 
[SZ97] 


line moment: Similar to the tradi- 
tional area moment but calculated 
only at points (x(s), y(s)) along the 
object contour. The pgth moment is: 
[ x(S (sds. The infinite set of line 
moments uniquely determine the con- 
tour. [WL93] 


line moment invariant: A set of invari- 
ant values computable from the line 
moments. These may be invariant 
to translation, scaling and rotation. 
[LG95] 


line of sight: A straight line from the 
observer or camera into the scene, usu- 
ally to some target: [JKS95:1.4] 


LINE OF SIGHT 


m- 


line scan camera: A camera that uses 
a solid-state or semiconductor (e.g., 
CMOS) linear array sensor, in which 
all of the photosensitive elements are 
in a single 1D line. Typical line scan 
cameras have between 32 and 8192 
elements. These sensors are used for a 
variety of machine-vision applications 
such as scanning, flow process control 
and position sensing. [BT88:Ch. 3] 


line segmentation: See 
segmentation. 


curve 


line spread function: Describes how an 
ideal infinitely thin line would be dis- 
torted after passing through an opti- 
cal system. Normally, this can be com- 
puted by integrating the point spread 
functions of an infinite number of 
points along the line. [Hec87:11.3.5] 


line thinning: See thinning. 


linear: 1) Having a line-like form. 
2) A mathematical description for 
a process in which the relationship 
between some input variables x and 
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some output variables } is given by y = 
AX where A is a matrix. [Hor86:6.1] 


linear array sensor: A solid-state or 
semiconductor (e.g., CMOS) sensor in 
which all of the photosensitive ele- 
ments are in a single 1D line. Typical 
linear array sensors have between 32 
and 8192 elements and are used in line 
scan cameras. [IJL87] 


linear discriminant analysis: See 
linear discriminant function. 


linear discriminant function: A basic 
classification process that determines 
which of two classes or cases a struc- 
ture belongs to. Assume a feature 
vector X that is based on observations 
of a structure and is augmented by an 
extra term with value 1. The linear 
discriminant function uses the sign of 
the linear function 1 = 4-x = ` a;%;, 
for a given coefficient vector a. For 
example, to discriminate between unit 
side squares and unit diameter circles 
based on the area A, the feature vector 
is x = (A, 1)’ and the coefficient vec- 
tor a = (1, —0.89)’. If Z > 0, then the 
structure is a square, otherwise it is a 
circle. [SB11:11.6] 


linear features: A general term for 
features that are locally or globally 
straight, such as lines or straight edges. 
[MN84] 


linear filter: A filter whose output is 
a weighted sum of its inputs, i.e., 
all terms in the filter are either con- 
stants or variables. If {x;} are the inputs 
(which may be pixel values from a local 
neighborhood, from the same posi- 
tion in different images of the same 
scene etc.), then the linear filter output 
would be ` a;x; + dao, for some con- 
stants a;. [FP03:Ch. 7] 


linear regression: Estimation of the 
parameters of a linear relationship 
between two random variables X 
and Y given sets of samples x; and 
Jı. The objective is to estimate the 
matrix A and vector 4 that mini- 
mize the residual r(A, ®© = >-; ||; — 
Ax; —al|?. In this form, the x; are 
assumed to be noise-free quantities. 
When both variables are subject to 
error, orthogonal regression is pre- 
ferred. [WP:Linear_regression] 


linear transformation: A mathemati- 
cal transformation of a set of values 
by addition and multiplication by con- 
stants. If the set of values is a vector x, 
the general linear transformation pro- 
duces another vector ý = AX, where 7 
need not have the same dimension as 
x and A is a constant matrix (i.e., is not 
a function of X). [SQ04:2.2.1] 


linearly non-separable: For a 
classification problem, a dataset whose 
classes are not linearly separable is 
said to be linearly non-separable. 
[DH73:Ch. 5] 


linearly separable: A binary 
classification problem is linearly 
separable if the two classes can 
be separated by a hyperplane. In 
the multiclass case, a problem is 
linearly separable if the decision 
boundaries can be represented by 
a linear classifier, i.e., a classifier 
where an input xX is assigned to the 
class 7=1,...,c which maximizes 
Ù, X + wo. [DH73:Ch. 5] 

lip shape analysis: An application of 
computer vision to understanding the 
position and shape of human lips as 
part of face analysis. The goal might 
be face recognition or expression 
understanding. [LBDX03] 


lip tracking: An application of computer 
vision to following the position and 
shape of human lips in a video 
sequence. The goal might be for lip 
reading, augmentation of deaf sign 
analysis or focusing of resolution dur- 
ing image compression. [KDB96] 


local: A local property of a mathematical 
object is one that is defined in terms 
only of a small neighborhood of the 
object, e.g., curvature. In image pro- 
cessing, a local operator operates on 
a small number of nearby pixels at a 
time. [Hor86:4.2] 


local binary pattern: Given a local 
neighborhood about a point, use the 
value of the central pixel to threshold 
the neighborhood. This creates a local 
descriptor of the gray scale struc- 
ture that is invariant to lightness and 
contrast transformations, that can be 
used to create local texture primitives. 
[PS06:4.7] 


local contrast adjustment: A form 


of contrast enhancement that adjusts 


pixel intensities based on the values of 
nearby pixels instead of the values of all 
pixels in the image. The figure shows 
deft) an original image and (right) an 
image that has the eye area’s bright- 
ness enhanced while maintaining the 
background’s contrast: [KZQ05] 


local curvature estimation: A part of 


surface or curve shape estimation that 
estimates the curvature at a given point 
based on the position of nearby parts of 
the curve or surface. For example, the 
curve y = sin(x) has zero local curva- 
ture at the point x = 0 (.e., the curve is 
locally uncurved or straight), although 
the curve has nonzero local curvature 
at other other points (e.g., at D. See 
also differential geometry. [WVVGO1] 


Local feature focus (LFF) method: 


A 2D part identification and pose 
estimation algorithm that can cope 
with large amounts of occlusion of the 
parts. The algorithm uses a mixture 
of property-based classifiers, graph 
models and geometric models. The key 
identification process is based around 
local configurations of image features 
that is more robust to occlusion. 
[BC82] 


local invariant: See local point 


invariant. 


local motion event: A spatio-temporal 


event within a video sequence describ- 
ing the movement of an object locally 
within the viewed scene over some 
restricted period of time. See also 
spatio-temporal analysis. [SCZ11] 


local operator: An image-processing 


operator that computes its output at 
each pixel from the values of the 
nearby pixels instead of using all or 
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most of the pixels in the image. 
[JKS95:1.7.2] 


local point invariant: A property of 


local shape or intensity that is invari- 
ant to translation, rotation, scaling, 
contrast or brightness changes etc. 
For example, a surface’s Gaussian 
curvature is invariant to change in posi- 
tion. [Low99] 


local surface shape: The shape of a sur- 


face in a “small” region around a point, 
often classified into one of a small num- 
ber of surface shape classifications. 
Computed as a function of the surface 


LLE algorithm achieves this by solv- 
ing a particular eigen-decomposition. 
[HTF08:14.9] 


location-based retrieval: The retrieval 


of information based on the position of 
a mobile device. This term is most fre- 
quently associated with mobile phone 
services but can equally be relevant in 
augmented reality applications where 
information may be overlaid on images 
or videos in a particular location. 
[SAW94b] 


log-likelihood: The log of the likelihood 


function. 


curvatures. [KvD92] 


local ternary patterns (LTP): A local 


texture descriptor which classifies all 

points in a region as being: 

e approximately equal to the central 
point (where the absolute differ- 
ence is less than some threshold); 

e less than the central point by more 
than the threshold; 

e greater than the central point by 
more than the threshold. 

Local ternary patterns are an extension 

of local binary patterns. [TT07] 


local variance contrast: The variance 


of the pixel values computed in a 
neighborhood about each pixel. Con- 
trast is the difference between the 
larger and smaller values of this vari- 
ance. Large values of this property 
occur in highly textured or varying 
areas. [FYF+01] 


localization: Often referred to as “spa- 


tial localization”. The identification 
of the position of a feature, object 
or target within an image, either in 
terms of coordinates or possibly a 
bounding box. Occasionally localiza- 
tion may mean temporal localization 
where the timing of an event, such as 
the abandonment of an object, is deter- 
mined within a video sequence. 


locally linear embedding (LLE): An 


algorithm for nonlinear dimensionality 
reduction, where the data has manifold 
structure. Given n data points in high 
dimension, LLE returns n points in a 
lower dimensional space, so that each 
point can be reconstructed as a lin- 
ear combination of its neighbors. The 
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log-normal distribution: If X is a ran- 


dom variable which follows a normal 
distribution (or Gaussian distribution), 
then Y = e* is said to follow a log- 
normal distribution; it has support in 
(0, 00). [MKB79:p. 43] 


log-polar image: A representation in 


which the pixels are not in the stan- 
dard Cartesian layout but instead have 
a space-varying layout. The image is 
parameterized by a polar coordinate 
0 and a radial coordinate r. How- 
ever, unlike polar coordinates, the 
radial distance increases exponentially 
as r grows. The mapping from posi- 
tion (6,7) to Cartesian coordinates is 
(B"cos(@), B’sin(@)), where B is some 
design parameter. Further, the amount 
of area of the image plane repre- 
sented by each pixel grows expo- 
nentially with r, although the precise 
pixel size depends on factors such as 
the amount of pixel overlap. See also 
foveal image. The receptive fields of 
a log-polar image (courtesy of Herman 
Gomes) can be seen in the outer rings 
of the figure: [WZ00] 


log-polar stereo: A form of stereo vision 


color re-mapping or standard functions 


in which the input images come from 
log-polar sensors instead of the stan- 
dard Cartesian layout. [GT00] 


logarithmic transformation: See pixel 


of integer pixel values (e.g., the loga- 
rithm of a pixel’s value). [Hor86:10.14] 


lossless compression: A category of 


image compression in which the orig- 


logarithm operator. 


logical object representation: An 
object representation based on some 
logical formalism such as predicate cal- 
culus. For example, a square can be 
defined as: square(s) <=> 
polygon(s) & number _of _sides(s, 4) 
& Ve, Ve2(e; Æ e2 & 

side_of (s, e1) & side_of(s, e2) 

& length(e,) = length(e2) 

& (parallel(e,, e2) 

| perpendicular (e,, e2))). [RM89] 


logistic regression: A widely-used 
model for two-class classification. Let 
an input feature vector be denoted as 
x. Subsequently, the logistic regression 
model predicts p(C\|X) = o(w!x+ 
Wo), where o denotes the logistic sig- 
moid function o() =(1+e%)"!, w 
is a vector of parameters, and wo is 
a bias (or offset) term. p(C2|x) = 1 — 
p(Ci|x). The parameters can be fit- 
ted by maximum likelihood estimation 
and the optimization problem can be 
shown to be convex (i.e., no local 
optima). [Bis06:4.3.2] 


long baseline stereo: See wide baseline 
stereo. 


long motion sequence: A video 
sequence of more than just a few 
frames in which there is significant 
camera or scene motion. The essen- 
tial idea is that the 3D scene structure 
can be inferred effectively by a stereo 
vision process. The matched image 
features can be tracked through the 
sequence, instead of having to solve 
the stereo correspondence problem. If 
a long sequence is not available, then 
analysis could use optical flow or short 
baseline stereo. 


look-up table: Given a finite set of input 
values {x; } and a function on these val- 
ues, f(x), a look-up table records the 
values {(x;, f(%))} so that the value 
of the function f() can be looked up 
directly rather than recomputed each 
time. Look-up tables can be used for 


low-angle 


inal image can be exactly recon- 
structed from the compressed image. 
This contrasts with lossy compression. 
(SB11:1.3.2] 


lossy compression: A category of image 


compression in which the original 
image cannot be exactly reconstructed 
from the compressed image. The goal 
is to lose insignificant image details 
(e.g., noise) while limiting percep- 
tion of changes to the image appear- 
ance. Lossy algorithms generally pro- 
duce greater compression than lossless 
compression. [SB11:1.3.2] 


low-activity region: A region in a scene 


where little of interest occurs. For 
example in a surveillance application, 
little of interest occurs away from the 
ground plane (e.g., on the upper parts 
of buildings). [CK99] 

illumination: A 
machine-vision technique, often 
used for industrial vision, where a light 


source (usually a point light source) 


is placed so that a ray of light from 
the source to the inspection point is 
almost perpendicular to the surface 
normal at that point. The situation 
can also arise naturally, e.g., from the 
sun position at dawn or dusk. One 
consequence of this low angle is that 
shallow surface shape defects and 
cracks cast strong shadows that may 
simplify the inspection process: 


CAMERA 


LIGHT SOURCE 


w 


TARGET POINT 


[Bil02:1.2.5] 
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low frequency: Usually referring to low 
spatial frequency in the context of 
computer vision. The low-frequency 
components of an image are the slowly 
changing intensity components of the 
image, such as large regions of bright 
and dark pixels. If low temporal fre- 
quency is the intended meaning, then 
low frequency refers to slowly chang- 
ing patterns of brightness or dark- 
ness at the same pixel in a video 
sequence. This figure shows the low- 
frequency components of an image: 
[WP:Low_frequency] 


% 


d 


low-level vision: A general and some- 


what imprecisely defined term (the 
definition is contentious) for the ini- 
tial stages of image analysis in a vision 
system. It can also be used for the 
initial stages of processing in biologi- 
cal vision systems. Roughly, low-level 
vision refers to the first few stages of 
processing applied to intensity images. 
Some authors use this term only for 
operations that result in other images. 
Edge detection is about where most 
authors would say that low-level vision 
ends and middle-level vision starts. 
[BB82:1.2] 


low-pass filter: This term is imported 


from 1D signal-processing theory into 
image processing. The term “low” is a 
shorthand for “low frequency”. In the 
context of a single image, that means 
low spatial frequency, i.e., intensity 
patterns that change over many pixels. 
A low-pass filter applied to an image 
thus leaves the low spatial frequency 
patterns, or large, slowly changing pat- 
terns, and removes the high spatial fre- 
quency components (sharp edges and 
noise). A low-pass filter is a kind of 
smoothing filter or noise reduction fil- 
ter. Alternatively, filtering is applied to 
the changing values of a given pixel 
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over an image sequence. In this case 
the pixel values can be treated as a 
sampled time sequence and the origi- 
nal signal-processing definition of “low 
pass filter” is appropriate. Filtering this 
way removes rapid temporal changes. 
See also high-pass filter. The figure 
shows an image and its low-pass fil- 
tered version: [Gal90:6.2] 


Lowe’s curve segmentation method: 


An algorithm that tries to split a curve 
into a sequence of straight line seg- 
ments. The algorithm has three main 
stages: 

1. Recursive splitting of segments into 
two shorter, but more line-like, seg- 
ments until all remaining segments 
are very short. This forms a tree of 
segments. 

2. Merging segments in the tree in a 
bottom-up fashion according to a 
straightness measure. 

3. Extracting the remaining unmerged 
segments from the tree as the seg- 
mentation result. 

[Low87a] 


Lucas—Kanade method: A differential 


method for the computation of optical 
flow in a video. It assumes that the flow 
is constant in a small area around the 
pixel under consideration and deter- 
mines the flow based on a least square 
estimation for all pixels within that 
area. [LK81] 


luma: The luminance component of 


light. Color can be divided into luma 
and chroma. [FP03:6.3.2] 


lumigraph: A device, invented by 


Fischinger in the 1940s, that produces 
beams of colored light by a person 
pressing on a screen. More recently in 
computer vision, “lumigraph” refers to 


a representation used to store (a sub- 
set of) the plenoptic function (i.e., the 
flow of light in all positions and direc- 
tions). [Sze10:13.3] 


luminance: The measured intensity 


from a portion of a scene. [Jai89:3.2] 


luminance efficiency: The sensor- 


specific function VQ) that determines 
how the observed light I(x, y, A) at 
sensor position (x, y) of wavelength i 


contributes to the measured luminance 


lox, y) = f TQ)V@)dd at that point. 
[Jai89:3.2] 


luminosity coefficient: A component 


of the tristimulus theory of color 


perception. The luminosity coefficient 


is the amount of luminance con- 
tributed by a given primary color to the 
total perceived luminance. [Jai89:3.8] 


luminous flux: The amount of light at 


all wavelengths that passes through 
a given region in space. Pro- 
portional to perceived brightness. 
[WP:Luminous_flux] 
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M-estimation: A robust generalization of 
least square estimation and maximum 
likelihood estimation. [FP03:15.5.1] 


Mach band effect: An effect in the 
human visual system in which an 
observer perceives a variation in 
brightness at the edges of a region 
of constant brightness. This variation 
makes the region appear slightly darker 
when it is beside a brighter region and 
slightly brighter when it is beside a 
darker region. [Jai89:3.2] 


machine learning: A set of methods for 
the automated analysis of structure in 
data. There two main strands of work: 

e unsupervised learning or descriptive 
modeling, where the goal is to find 
interesting patterns or structure in 
the data; 

e supervised learning or predictive 
modeling, where the goal is to pre- 
dict the value of one or more vari- 
ables given some others. 

These goals are similar to those of 

data mining, but the focus is more 

on autonomous machine performance, 
rather than enabling humans to learn 

from the data. [Mur12:p. 1] 


machine vision: A general term for pro- 
cessing image data by a computer; 
often synonymous with computer 
vision. There is a slight tendency 
to use “machine vision” for practi- 
cal vision systems, such as industrial 


a tiling. Contrast with microtexture. 
[JKS95:7.1] 


magnetic resonance imaging (MRI): 
See nuclear magnetic resonance. 


magnification: The process of enlarge- 
ment (e.g., of an image). The amount 
of enlargement applied. [Jai89:7.4] 


magnitude-retrieval problem: The 
reconstruction of a signal based on 
only the phase (not the magnitude) of 
the Fourier transform. [McD04] 


Mahalanobis distance: The distance 
between two N-dimensional points 
scaled by the statistical variation in 
each component of the point. For 
example, if x and } are two points from 
the same distribution that has covari- 
ance matrix C then the Mahalanobis 
distance is given by 

(G—WC'&- jy? 
The Mahalanobis distance is the same 
as the Euclidean distance if the covari- 
ance matrix is the identity matrix. A 
common usage in computer vision sys- 
tems is for comparing feature vectors 
whose elements are quantities having 
different ranges and amounts of vari- 
ation, such as a 2-vector recording 


the properties of area and perimeter. 
[SB11:11.8] 


mammogram analysis: Analyzing an 
X-ray of the human female breast (a 


vision, and “computer vision” for more 
exploratory vision systems or for sys- 
tems that aim at some of the com- 
petences of the human vision system. 
[JKS95:1.1] 


macrotexture: The intensity pattern 
formed by spatially organized tex- 
ture primitives on a surface, such as 


mammogram), usually for the detec- 
tion of potential signs of cancerous 
growths. [FB03] 


man in the loop: The inclusion of a per- 
son within the control system of, e.g., 
an unmanned vehicle such as a drone 
aircraft. The man in the loop is required 
until these systems can be shown to be 
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reliable at performing tasks completely 
autonomously. [FDS90] 


Manhattan distance: Also called the 
Manhattan metric. Motivated by the 
problem of only being able to walk 
along city blocks in dense urban envi- 
ronments, the distance between points 
(M1, M1) and (2, 92) is | xı — x2 | + | 
Yı — J» |. [BB82:2.2.6] 


Manhattan world: Scenes consisting 
of planar surfaces in three dominant 
orthogonal planes. Many real-world, 
manmade objects, such as the interior 
and exterior of most buildings, can 
be considered as Manhattan world 


scenes: [CW99] 


manifold: A topological space that is 
locally Euclidean. Data analysis often 
involves dimensionality reduction, 
where a low-dimensional manifold 
is embedded in a higher-dimensional 
space. A linear manifold (or subspace) 
may be extracted using principal 
component analysis (PCA); for non- 
linear methods see dimensionality 
reduction. [Lov10:2.1] 


many-to-many graph matching: A 
version of graph matching in which 
clusters of vertices from one graph are 
matched with clusters from another. 
This is particularly useful in vision 
applications where graphs cannot be 
perfectly matched using one-to-one 
matching. 


Graph A Graph B 


[DSD+04] 


many view stereo: See multi-view 
stereo. 


MAP: See 
probability. 


maximum a posteriori 


map analysis: Analyzing an image of 
a map (e.g., obtained with a flat-bed 
scanner) in order to extract a symbolic 
description of the terrain described 
by the map. This is now largely 
obsolete, given digital map databases. 
[WP:Map_analysis] 


map registration: The registration of 
a symbolic map to (usually) aerial or 
satellite image data. This may require 
identifying roads, buildings or land fea- 
tures. The figure shows a road model 
(black) overlaying an aerial image: 
[LM07] 


marching cubes: An algorithm for deter- 
mining a polygonal approximation to 
a surface from a voxel representation. 
Each vertex on the voxel is given a 
binary value depending on whether 
it is inside or outside the object and 
polygonal surfaces are defined for each 
group of eight vertices forming a vir- 
tual cube. The algorithm is very effi- 
cient as there are only 256 possible 
arrangements of each group of eight 
vertices. Given a function f() on the 
voxels, the algorithm estimates the 
position of the surface fœ) = c for 
some c. This requires estimating where 
the surface intersects each of the 12 
edges of a voxel. Many implementa- 
tions propagate from one voxel to its 
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neighbors, hence the term “marching”. 
[LC87] 


marginal distribution: A probability 
distribution of a random variable X 
derived from the joint probability dis- 
tribution of a number of random 
variables integrated over all variables 
except X. [WP:Marginal_distribution] 


marginal likelihood: A Bayesian 
statistical model M has parameters 0. 
Given data D the marginal likelihood 
BOIM) = f pCD\O, MI) pO|M)d 6 
can be computed by integrating the 
likelihood function over the prior 
distribution. The marginal likelihood 
is used in Bayesian model selection. 
[Bis06:3.4] 


marked point process: A random pro- 
cess of points, an instance of which 
will consist of a set of points generally 
in space (e.g., the locations of occur- 
rences of particular features or events) 
or time (e.g., the times of occurrences 
of particular events). In a marked point 
process, parameters (the “marks”) are 
associated with each point (in addition 
to the location or time information). 
[Cre93:8.7] 


markerless motion capture: Capture 
of the way in which an object, person 
or animal moves through the analysis 
of video. Critically, the video is taken 
without physical markers attached to 
the subject. See also motion capture. 
[CMC+06] 


Markov chain: A state system model 
in which the probability of the next 
state transition depends only on the 
current state. The figure shows tran- 
sitions between four states including 
self transitions: [Bis06:11.2.1] 


Markov chain Monte Carlo (MCMC): 
A statistical inference method use- 
ful for estimating the parameters of 
complex distributions. The method 
generates samples from the distribu- 
tion by running the Markov chain 
that models the problem for a long 
time (hopefully to equilibrium) and 
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then uses the ensemble of samples to 
estimate the distribution. The states 
of the Markov chain are the pos- 
sible configurations of the problem. 
[WP:Markov_chain_Monte_Carlo] 


Markov decision process (MDP): A 
process in which an agent can sense 
a set of distinct states in the environ- 
ment and has a set of actions which it 
can perform. If the agent is in state s; 
at time ¢ and takes action a, then it 
transitions into state s;41; P(S:+41|S:, Ar) 
defines the transition probabilities for 
the MDP. The agent also receives a 
reward r(s;, Stı, A) associated with 
this transition. The Markov aspect of 
the MDP arises from the fact that 
the transition probability to the new 
state s,,, depends only on the cur- 
rent state and action, not on earlier 
ones. The key problem for the agent 
is to find a policy (a possible stochas- 
tic mapping from states to actions) so 
as to maximize some cumulative func- 
tion of the rewards. The MDP formal- 
izes the problem of optimal control. 
[SB98:3.6] 


Markov network: See Markov random 
field (MRF). 


Markov process: A stochastic process in 
time whose past has no influence on 
the future if the present is specified. 
[Pap91:16.4] 


Markov property: A property for which 
the conditional probability of a future 
state depends only on the present state 
rather than anything which preceded 
it. [BisO6:11.2.1] 


Markov random field (MRF): An 
undirected graphical model. Compare 
with conditional random field (CRF). 
An application is as an image model 
in which the value at a pixel can be 
expressed as a linear weighted sum 
of the values of pixels in a finite 
neighborhood about the original pixel 
plus an additive random noise value. 
(Mur12:6.3] 


Markovian assumption: The inherent 
assumption underlying Markov chains: 
the values in any state are affected only 
by the values in the previous state. 
[SHB08:p. 799] 


Marr-Hildreth edge detector: A filter 
for edge detection based on multi-scale 
analysis of the zero crossing of the 


mathematical morphology opera- 


tion: A class of mathematically defined 
image-processing operations, which 


Laplacian of a Gaussian operator. 
[NA05:4.3.3] 


Marr’s theory: Shorthand for “Marr’s 
theory of the human vision system”. 
Some of the key stages in this inte- 
grated but incomplete theory are the 
raw primal sketch, the full primal 
sketch, the 2.5D sketch and 3D object 
recognition. [BT88:Ch. 11] 


mask: An mx n array of numbers or 
symbolic labels, used as the smoothing 
mask in a convolution operator, the tar- 
get in template matching, the kernel in 
a mathematical morphology operation 
etc. The figure shows a simple mask 
for computing an approximation to the 
Laplacian operator: [TV98:3.2] 


matched filter: An operator that pro- 
duces a strong result in the output 
image when it processes a portion of 
the input image containing a pattern 
for which it is “matched”. For example, 
the filter could be tuned for the letter 
“e” in a given font size and type style 
or a particular face viewed at the right 
scale. It is similar to template matching 
except the matched filter can be tuned 
for spatially separated patterns. This is 
a signal-processing term imported into 
image processing. [Jai89:9.12] 


can apply to both binary images and 
gray scale images, in which the result 
is based on the spatial pattern of 
the input data values rather than on 
the values themselves. For example, a 
morphological line-thinning algorithm 
(see thinning operator) would identify 
places in an image where a line descrip- 
tion (i.e., the pattern to match) was 
represented by data more than 1 pixel 
wide. The thinning algorithm would 
choose one of the redundant pixels to 
be set to 0. This figure shows a small 
image patch before and after a thinning 
operation: [SQ04:Ch. 7] 


matrix: A mathematical structure of a 


given number of rows and columns 
with each entry usually containing a 
number. It can be used to represent 
a transformation between two coor- 
dinate systems, record the covariance 
of a set of vectors etc. A matrix for 
rotating a 2D vector by 4 radians is: 
[Jai89:2.7] 


cmn() (Z) 
an() eos(2) 


0.866 0.500 
—0.500 0.866 


matrix-array camera: A 2D solid-state 


chip sensor for imaging, such as those 
found in typical current video cameras, 
webcams and machine vision cameras. 


matching function: See similarity 
metric. 


[Gal90:2.1.3] 


matte extraction: The derivation of 


matching method: A general term an alpha channel (.e., transparency 


for finding the correspondences 
between two structures (e.g., surface 
matching) or sets of features (see 
the stereo correspondence problem). 
[JKS95:15.5.2] 


information) or binary mask for an 
image or video. Often the alpha chan- 
nel, or matte, may be used to delin- 
eate an object of interest in a scene. 
[Sze10:10.4] 
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matte surface: A surface whose 
reflectance follows the Lambertian 
surface model. [BB82:3.5.1] 


matting: The process of combining mul- 
tiple images using a matte or mattes 
(binary masks) to indicate which parts 
of each image to use. [Sze10:3.1.3] 


maximal clique: A clique for which 
no further nodes exist that are con- 
nected to all nodes. Maximal cliques 
may have different sizes - the issue is 
maximality, not size. Maximal cliques 
are used in association graph match- 
ing algorithms to represent maximally 
matched structures. The graph in the 
figure has two maximal cliques, BCDE 
and ABD: [BB82:11.3.3] 


maximum a posteriori probability 
(MAP): The configuration of random 
variables in a statistical model that 
maximizes the posterior distribution. 
This term is often used in the con- 
text of parameter estimation, pose 
estimation or object recognition prob- 
lems, in which case we wish to 
estimate the parameters, position or 
identity (respectively) that have high- 
est posterior probability given the 
observed image data. However, this 
approach is, at best, an approxima- 
tion to the full Bayesian statistical 
model treatment, which involves inte- 
gration over the posterior distribution. 
[Mac03:p. 306] 


maximum entropy: The probability 
density p(x) that maximizes the 
entropy of the distribution sub- 
ject to constraints of the form 
[riOpoddx =c; for i=1,...,m. 
It can be shown that the solu- 
tion has the form p(x) = expo + 
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ye Ait) where the As are chosen 
so that p(x) satisfies the constraints 
and normalizes to 1. [CT91:11.1] 


maximum entropy restoration: An 
image restoration technique based on 
maximum entropy. [PTVF92:18.7] 


maximum flow: The route with max- 
imum capacity (flow) from a source 
to a destination node in a network. 
[Gib85:4.2] 


maximum margin: In a binary 
classification problem which is 
linearly separable, let the perpen- 
dicular distance from a separating 
hyperplane to the nearest +1 class 
point be d, and similarly d_ for nearest 
class —1 point. The margin is defined 
as min(d,, d_) and the support vector 
machine algorithm determines the 
hyperplane that gives rise to the maxi- 
mum margin with d} = d_. The figure 
shows the maximum margin line 
between two point sets: [SS02:7.3] 
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maximum margin Hough transform: 
An extension on the probabilistic 
Hough transform which learns weights 
for various features of objects in order 
to increase the likelihood of obtaining 
correct matches. [MM09] 


MCMC: See Markov chain Monte Carlo. 


MDL: See minimum description length. 


mean: The mean u of a random variable 
is the expectation value of X, i.e., = 
E[X]. This is also commonly known as 
the statistical average. See also sample 
mean. [MKB79:2.2.2] 


mean and Gaussian curvature shape 
classification: A classification of a 
local (i.e., very small) surface patch 
(often a single pixel from a range 


image) into one of a set of simple sur- 
face shape classes based on the signs 
of the mean curvature and Gaussian 
curvature. The standard set of shape 
classes is: 

{plane, concave cylinder, convex 
cylinder, concave ellipsoid, convex 
ellipsoid, saddle valley, saddle ridge, 
minimal}. Sometimes the classes sad- 
dle valley, saddle ridge and minimal 
are conflated into the single class 
“hyperbolic”. This figure summarizes 
the classifications based on the curva- 
ture signs: [Rob01] 
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mean curvature: A mathematical char- 
acterization for a component of local 
surface shape at a point on a smooth 
surface. Each point can be uniquely 
described by a pair of principal 
curvatures. The mean curvature is the 
average of the principal curvatures. 
[JKS95:13.3.2] 


mean field approximation: A 
method for approximate probabilistic 
inference in which a complex distri- 
bution is approximated by a simpler 
factorized one. [Bis06:10.1.1] 


mean filter: See mean 


operator. 


smoothing 


mean shift: An adaptive gradient ascent 
technique that operates by iteratively 
moving the center of a search window 
to the average of certain points within 
the window. [WP:Mean-shift] 


mean shift filtering: A tracking tech- 
nique that tracks the contents of a spec- 
ified region from a frame of video. The 
contents of this region are represented 
by a weighted histogram and tracked 


from frame to frame using gradient 
ascent and the Bhattacharyya coeffi- 
cient. [CRMOO] 


mean smoothing operator: A noise 
reduction operator that can be applied 
to a gray scale image or to separate 
components of a multi-spectral image. 
The output value at each pixel is the 
average of the values of all pixels in 
a neighborhood of the input pixel. 
The size of the neighborhood deter- 
mines how much smoothing (or noise 
reduction) is done, but also how much 
blur of fine detail occurs. The figure 
shows (left) an image with Gaussian 
noise with o = 13 and (right) its mean 
smoothing: [JKS95:4.3] 


i 


mean value coordinates: The average 
coordinates of some planar surface. 
[Flo03] 


measurement matrix: A matrix con- 
taining, e.g., the coordinates of fea- 
tures for an object, image or video. 
[Sze10:7.3] 


measurement resolution: The degree 
to which two differing quantities can 
be distinguished by measurement. This 
may be the minimum spatial dis- 
tance that two adjacent pixels repre- 
sent (spatial resolution), the minimum 
time difference between visual obser- 
vations (temporal resolution) etc. [WP: 
Resolution#Measurement_resolution] 


medial: Pertaining to the midline. For 
example the medial line is the cen- 
tral axis in a shape. [WP:Medial_ 
(disambiguation)] 


medial axis skeletonization: See 
medial axis transform. 


medial axis transform: An operation 
on a binary image that transforms 
regions into sets of pixels that are the 
centers of circles that are bitangent 
to the boundary and that fit entirely 
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within the region. The value of each 
point on the axis is the radius of the 
bitangent circle. This can be used to 
represent the region by a simpler axis- 
like structure and is most effective on 
elongated regions. The figure shows a 
region and its medial axis: [BB82:8.3.4] 


medial line: A curve going through the 
middle of an elongated structure. See 
also medial axis transform. The figure 


shows a region and its medial line: 
[BB82:8.3.4] 


medial surface: The 3D generalization 
of the medial axis of a planar region. It 
is the locus of the centers of spheres 
that touch the surface of the volume at 
three or more points. [BB82:8.3.4] 


median filter: See median smoothing. 


median flow filtering: A noise 
reduction operation on vector data 
that generalizes the median smoothing 
on image data. The assumption is that 
the vectors in a spatial neighborhood 
about the current vector should be 
similar. Dissimilar vectors are rejected. 
The term “flow” arose through the 
filter’s development in the context of 
image motion. [SSCW98] 


median smoothing: An image noise 
reduction operator that replaces a 
pixel’s value by the median (mid- 
dle) of the sorted pixel values in its 
neighborhood. The figure shows an 
image with salt-and-pepper noise and 
the result of applying median smooth- 
ing: [JKS95:4.4] 
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medical image registration: A gen- 
eral term for the registration of two 
or more medical image types or an 
atlas registration with some image 
data. A typical registration would 
align X-ray CAT and NMR images. 


[WP:Image_registration#Applications] 


membrane model: A surface-fitting 
model that minimizes a combination 
of the smoothness and the closeness of 
the fit surface to the original data. The 
surface class must have C° continuity 
and thus it differs from the smoother 
thin plate model that has C! continu- 
ity. [LMY92] 


mesh compression: The simplification 
of the set of vertices, edges and faces 
of a 3D triangular mesh model. [KGOO] 


mesh model: A tessellation of an image 
or surface into polygonal patches, 
much used in computer-aided design 
(CAD). The vertices of the mesh are 
called nodes, or nodal points. A pop- 
ular class of meshes is based on trian- 
gles, e.g., the Delaunay triangulation. 
Meshes can be uniform, i.e., all poly- 
gons are the same, or non-uniform. 
Uniform meshes can be represented 
by small sets of parameters. Surface 
meshes have been used for modeling 
freeform surfaces (e.g., faces and land- 
scapes). See also surface fitting. This 
icosahedron is a mesh model of a nearly 
spherical object: [JKS95:13.5] 


mesh subdivision: Methods for sub- 
dividing cells in a mesh model 
into progressively smaller cells. For 
example, see Delaunay triangulation. 
[WP:Mesh_ subdivision] 


message-passing process: A process 
where a computation is achieved 
by passing messages between differ- 
ent objects (or units). For example 
belief propagation can be used to 
carry out probabilistic inference in 
some probabilistic graphical models. 
[KFO9:Ch. 10] 


metameric colors: Colors that are 
defined by a limited number of chan- 
nels each of which integrates a range 
of the spectrum. Hence the same 
metameric color can be caused by 
a variety of spectral distributions. 
[Hor86:2.5.1] 


metric determinant: A measure of cur- 
vature. For surfaces, it is the square 
root of the determinant of the first 
fundamental form matrix of the sur- 
face. [JJK03] 


metric property: A visual property 
that is a measurable quantity, such 
as a distance or area. This contrasts 
with logical properties, such as image 
connectedness. [HZ00:1.7] 


metric reconstruction: Reconstruction 
of the 3D structure of a scene 
with correct spatial dimensions and 
angles. This contrasts with projective 
reconstruction. The figure shows the 
metrical and projective reconstruc- 
tions of a cube: 


OBSERVED RECONSTRUCTED OBSERVED RECONSTRUCTED 


VIEW VIEW VIEW VIEW 
METRICAL PERSPECTIVE 
RECONSTRUCTION RECONSTRUCTION 


The metrical projection looks “cor- 
rect" from all views but the perspec- 
tive projection may look “correct" 
only from the views in which the 
data was acquired. [WP:Camera_auto- 
calibration#Problem_statement] 


metric stratum: The set of similarity 
transformations (i.e., rigid transforma- 
tions with a scaling). They can be 


recovered from image data without 
external information such as some 
known length. [Pol00] 


metrical calibration: Calibration of 
intrinsic parameters and extrinsic 
parameters to enable metric 
reconstruction of a scene. [VHOO] 


Mexican hat operator: A convolution 
Operator that implements either 
a Laplacian of Gaussian operator 
or difference-of-Gaussians operator 
(which produce very similar results). 
The mask that can be used to imple- 
ment this convolution has a shape 
similar to a Mexican hat (a sombrero): 
[JKS95:5.4] 
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micro-mirror array: An array of tiny 
mirrors that can be used to provide 
control of both the geometric and 
radiometric characteristics of images 
produced by the system, by elec- 
tronically altering their orientations. 
[NBB04] 


micron: One millionth of a meter; a 
micrometer. [Hec87:2.2] 


microscope: An optical device for 
observing small structures, such as 
organic cells, plant fibers or integrated 
circuits. [Hec87:5.7.5] 


microtexture: See statistical texture. 


mid-sagittal plane: The plane that sep- 
arates the body (and brain) into left 
and right halves. In medical imag- 
ing (e.g., nuclear magnetic resonance), 
it usually refers to a view of the 
brain sliced down the middle between 
the two hemispheres. [WP:Sagittal_ 
plane#Variations] 
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middle-level vision: A general term 
referring to the stages of visual data 
processing between low-level vision 
and high-level vision. There are many 
variations of the definition of this term 
but a usable rule of thumb is that 
middle-level vision starts with descrip- 
tions of the contents of an image and 
results in descriptions of the features 
of the scene. Consequently, binocular 
stereo would be a middle-level vision 
process because it acts on image edge 
fragments to produce 3D scene frag- 
ments. [Ker99] 


millimeter-wave radiometric images: 
An image produced using sub- 
millimeter terahertz radiation (which 
is able to penetrate clothing). Most 
recently used within full body 
scanners at security checkpoints. 
[WP:Millimeter_wave_scanner] 


MIMD: See multiple instruction multiple 
data. 


minimal point: A point on a hyper- 
bolic surface where the two principal 
curvatures are equal in magnitude 
but opposite in sign, i.€., Kı = —K2. 
[WP:Maxima_and_minima] 


minimal spanning tree: Consider a 
graph G and a subset 7 of the arcs 
in G such that all nodes in G are still 
connected in T and there is exactly 
one path joining any two nodes. T is a 
spanning tree. If each arc has a weight 
(possibly constant), the minimal span- 
ning tree is the tree with smallest total 
weight: [DH73:6.10.2.1] 


PS vl 


GRAPH MINIMAL SPANNING TREE 


minimum bounding rectangle: 
The rectangle of smallest area that 
surrounds a set of image data. 
[WP:Minimum_bounding_ rectangle] 


minimum description length (MDL): 
A criterion for comparing descriptions 
usually based on the implicit assump- 
tion that the best description is the one 
that is shortest (i.e., takes the least bits 
to encode). The minimum description 
usually requires several components: 
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e the models observed (e.g., whether 
lines or circular arcs); 

e the parameters of the models (e.g., 
the line endpoints); 

e how the image data varies from the 
models (e.g., explicit deviations or 
noise model parameters); 

e the remainder of the image that is 
not explained by the models. 

[FP03:16.3.4] 


minimum distance classifier: Given 
an unknown sample with feature vec- 
tor xX, select the class c with model 
vector m, for which the distance 
|| x — m, || is smallest. [SB11:11.8] 

minimum spanning tree: See minimal 
spanning tree. 


MIPS: millions of instructions per sec- 
ond. [WP:Instructions_per_second] 


mirror: A surface with specular 
reflection for which incident light is 
reflected only at the same angle and in 
the same plane as the surface normal. 
[Hec87:5.4] 


miss-one-out test: See leave-one-out 
test. 


missing data: Data that is unavailable, 
hence requiring it to be estimated. 
For example, a moving person may 
become occluded resulting in missing 
position data for a number of frames. 
[FP03:16.6.1] 


missing pixel: A pixel for which no 
value is available (e.g., if there was a 
problem with a sensing element in the 
image sensor). [FP03:16.6.1] 


mixed pixel: A pixel whose measure- 
ment arises from more than one scene 
phenomenon. For example, a pixel 
that observes the edge between two 
regions has a gray scale level that lies 
between the gray levels of the two 
regions. [PMPP04] 


mixed Poisson-Gaussian noise: 
Image noise which is composed of 
both signal-dependent Poisson noise 
and signal-independent Gaussian 
noise. [LBU11] 


mixed reality: Image data that con- 
tains both original image data and 
overlaid computer graphics. See also 
augmented reality. This figure shows 


an example of mixed reality, where 
the butterfly is a graphical object 
added to the image of the small robot: 
[WP:Mixed_reality] 


mixing proportion: In a mixture 
model, the probability 7; of selecting 
source 7, with `, 7; = 1. [Bis06:Ch. 9] 


mixture model: A probabilistic repre- 
sentation in which more than one 
distribution is combined, modeling a 
situation where the data may arise 
from different sources or have different 
behaviors, each with different prob- 
ability distributions. The overall dis- 
tribution thus has the form p(x) = 
Xm pi), where p;(x) denotes the 
density of source 7 and x; is its 
mixing proportion. A mixture model 
may be fitted to data using the 
expectation maximization (EM) algo- 
rithm. [Bis06:Ch. 9] 


maximum likelihood estimation: The 
value of the parameters of a statistical 
model that maximize the likelihood 
function. For example, the maximum 
likelihood estimate of the mean of a 
Gaussian distribution is the average of 
the observed samples drawn from that 
distribution. [Was04:9.3] 


modal deformable model: A 
deformable model based on modal 
analysis (i.e., study of the shapes that 
an object can assume). [GKP+07] 


mode filter: A noise reduction fil- 
ter that, for each pixel, outputs the 
mode (most common) value in its 
local neighborhood. The figure shows 
deft) an image with salt-and-pepper 
noise and (right) the filtered version: 
[NA05:3.5.3] 


model: An abstract representation of 
some object or class of objects. 
[WP:Model] 


model acquisition: The process of 
learning a model, usually based on 
observed instances or examples of the 
structure being modeled. This may be 
simply learning the parameters of a 
distribution from examples. For exam- 
ple, one might learn the image tex- 
ture properties that distinguish tumor- 
ous cells from normal cells. Alterna- 
tively, the structure of the object might 
be learned as well, such as construct- 
ing a model of a building from a 
video sequence. Another type of model 
acquisition involves learning the prop- 
erties of an object, such as the proper- 
ties and relations that define a square as 
compared to other geometric shapes. 
[FP03:21.3] 


model-assisted search: An algorithm 
which searches for a match (e.g., face 
recognition) and uses a higher level 
model (e.g., a 3D model of the face) 
to assist the process. 


model base: A database of models usu- 
ally used as part of an identification 
process. [JKS95:15.1] 


model base indexing: Selecting one or 
more candidate models from a model 
base of structures known by the sys- 
tem. This is usually to eliminate exhaus- 
tive testing with every member of the 
model base. [FP03:16.3] 


model-based coding: A method of 
encoding the contents of an image (or 
video sequence) using a pre-defined or 
learned set of models. This could pro- 
duce a more compact description (see 
model-based compression) or a sym- 
bolic description of the image data. 
For example, a Mondrian-style image 
could be encoded by the positions, 
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sizes and colors of the colored rectan- 
gular regions. [AH95] 


model-based compression: An appli- 
cation of model-based coding for the 
purpose of reducing the amount of 
memory required to describe an image 
while still allowing reconstruction of 
the original image. [Umb98:5.3.6] 


model-based feature detection: Using 
a parametric model of a feature to 
locate instances of the feature in an 
image. For example, a parametric edge 
detector uses a parameterized model of 
a step edge that encodes edge direction 
and edge magnitude. [BNM98] 


model-based object recognition: The 
identification of unknown objects 
through the use of known models. 
[FP03:Ch. 18] 


model-based recognition: Identifi- 
cation of the structures in an image 
by using some internally represented 
model of the objects known to the 
computer system. The models are 
usually geometric models. The recog- 
nition process finds image features 
that match the model features with the 
right shape and position. The advan- 
tage of model-based recognition is that 
the model encodes the object shape 
thus allowing predictions of image 
data and less chance of coincidental 
features being falsely recognized. 
[TV98:10.1] 


model-based segmentation: An image 
segmentation process that uses 
geometric models to partition the 
image into different regions. For exam- 
ple, aerial images could have the visible 
roads segmented using a GIS model of 
the road network. [FP03:Ch. 14] 


model-based tracking: An image 
tracking process that uses models to 
locate the position of moving targets 
in an image sequence. For example, 
the estimated position, orientation 
and velocity of a modeled vehicle in 
one image allows a strong prediction 
of its location in the next image in the 
sequence. [FP03:Ch. 17] 


model-based vision: A general term for 
using models of the objects expected 
to be seen in the image data to help 
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with image analysis. The model allows, 
among other things, prediction of addi- 
tional model feature positions, verifica- 
tion that a set of features could be part 
of the model and understanding of the 
appearance of the model in the image 
data. [FP03:Ch. 18] 


model building: See also model 
acquisition. The process of construct- 
ing a geometric model usually based on 
observed instances or examples of the 
structure being modeled, such as from 
a video sequence. 


model exploitation: A directed explo- 
ration approach where both positive 
and negative knowledge are used to 
guide the search process. 


model exploration: Undirected explo- 
ration of a model (as opposed to model 
exploitation). 

model fitting: See model registration. 

model inference: The process of 
parameter estimation in a model or 


of model selection over a number of 
models. 


model invocation: See model base 
indexing. 


model matching: The identification 
of a corresponding, typically known, 
model for some unknown data. For 
example, matching an unknown 2D or 
3D shape extracted from an image with 
one from a database of known object 
models. See also object recognition. 
[TM87] 


model order: The number of variables 
used to describe the model, e.g., the 
number of lagged variables used in an 
autoregressive model. [SVRO8] 


model order selection: A special case 
of model selection where there are a 
number of similar models of increasing 
complexity, e.g., polynomial regres- 
sion with polynomials of different 
order or an autoregressive model with 
different lengths of history. 


model parameter 
parameter estimation. 


learning: See 


model parameter update: When car- 
rying out model parameter learning, 
the performance of the model is 


typically optimized by an iterative algo- 
rithm, leading to updates of the model 
parameters on each iteration. 


model reconstruction: 
acquisition. 


See model 


model registration: A general term for 
aligning a geometric model to a set of 
image data. The process may require 
estimating the rotation, translation and 
scale that maps a model onto the image 
data. There may also be shape parame- 
ters, such as model length, that need to 
be estimated. The fitting may need to 
account for perspective distortion. The 
figure shows a 2D model registered on 
an intensity image of the same part: 
[Nev82:3.3] 


model robustness: A statistical model 
whose fitting is not significantly influ- 
enced by outliers. See also robust 
statistics. [GCSR95:Ch. 12] 


model selection: 1) The task of select- 
ing one of a number of candidate 
models. The goal is to select the “best” 
model of the data; for a supervised 
learning task, best could be defined in 
terms of the predictive performance 
on test data. An important issue 
here is over-fitting - performance 
on the training set might not be a 
good indicator of performance on a 
test set. Typically models with more 
parameters have a greater danger of 
over-fitting, so methods such as the 
Akaike Information Criterion, Bayesian 
information criterion or minimum 
description length, which penalize the 
number of parameters can be used. 
Bayesian model comparison can be 
carried out on the basis of the marginal 
likelihood and prior probabilities for 
each model. Probably the most com- 
mon way to select predictive models 
is via cross-validation. It is not always 
necessary to select one model; one 
might combine the predictions of 


different models, see e.g., ensemble 
learning or Bayesian model averaging. 
([HTFO8:Ch. 7]. 

2) See model 
[FP03:16.3] 


model structure: The organization of 
primitives such as surfaces or edge seg- 
ments within a representation (model). 
[WP:Geometric_modeling] 


base indexing. 


model topology: A mathematical field 
considering properties of models 
which are invariant under continuous 
deformations of the objects being mod- 
eled, e.g., connectivity. 


modulation transfer function (MTF): 
Informally, a measure of how well spa- 
tially varying patterns are observed by 
an optical system. More formally, in a 
2D image, let X(f,, fo) and YC fp, fo) 
be the Fourier transforms of the input 
x(b,v) and output y(h, v) images. 
Then, the MTF of a horizontal and 
vertical spatial frequency pair (fp, fo) 
is | H(fs fc| /| H©,0)|, where 
HC fo, fo) = YCfo, fo) Xfo fo). This 
is also the magnitude of the optical 
transfer function. [Jai89:2.6] 


moiré fringe: An interference pattern 
that is observed when spatially sam- 
pling, at a given spatial frequency, a 
signal that has a slightly different spa- 
tial frequency. The result is a set of light 
and dark bands in the observed image. 
As well as causing image degradation, 
this effect can also be used in range 
sensors, where the fringe positions 
give an indication of surface depth. The 
figure shows typical observed fringe 
patterns: [Jai89:4.4] 


on 
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moiré interferometry: A technique for 
contouring surfaces that works by 
projecting a fringe pattern (e.g., of 
straight lines) and observing this pat- 
tern through another grating. This 
effect can be achieved in other ways 
as well. The technique is useful for 
measuring extremely small stress and 
distortion movements. [WP:Moire_ 
pattern#Interferometric_approach] 


moiré pattern: See moiré fringe. 


moiré topography: A method for mea- 
suring the local shape of a surface by 
analyzing the spacing of moiré fringes 
on the target surface. 


moment: A method for summarizing the 


distribution of pixel positions or val- 
ues. Moments are a parameterized fam- 
ily of values. For example, if I(x, y) 
is a binary image then X, „I(x, yx? yI 
computes its pqth moment m,,. See 
also gray scale moment and moment 
of intensity. [Jai89:9.8] 


moment characteristic: See moment 
invariant. 


moment invariant: A function of image 


moment values that keeps the same 
value even if the image is transformed 
in some manner. For example, the 
value 4, C20)" + ({4o2)*) is invariant 
where [pq are central moments of a 
binary image region and A is the area 
of the region. This value is a constant 
even if the image data has been sub- 
ject to translation, rotation or scaling. 
[Jai89:9.8] 


moment of intensity: An image 
moment value that takes account of 
the gray scales of the image pixels as 
well as their positions. For example, 
if G(x, y) is a gray scale image, then 
XG, xP yI computes its pqth 
moment of intensity gp,. See also gray 
scale moment. [CC84] 


Mondrian: A famous visual artist from 


the Netherlands, whose later paint- 
ings were composed of adjacent 
rectangular blocks of constant color 
(i.e., without shading). This style of 
image has been used for much color 
vision research and, in particular, 
color constancy because of its simpli- 
fied image structure, without shading, 
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specularity, shadow or light source. 
(Hor86:9.2] 


monochrome: Containing only shades 
of a single color, usually gray, going 
from pure black to pure white. 
[WP:Monochrome] 


monocular: Using a single camera, 
sensor or eye. This contrasts with 
binocular and multi-ocular stereo 
where more than one sensor is used. 
Sometimes there is also the implication 
that the image data is acquired from 
a single viewpoint as a single camera 
taking images over time is mathemat- 
ically equivalent to multiple cameras. 
[BB82:2.2.2] 


monocular depth cue: Image evi- 
dence that indicates that one sur- 
face may be closer to the viewer 
than another. For example, motion 
parallax or occlusion relationships give 
evidence of relative depths. [WP: 
Depth_perception#Monocular_cues] 


monocular visual space: The visual 
space behind the lens in an opti- 
cal system. This space is commonly 
assumed to be without structure but 
scene depth can be recovered from 
the defocus blurring that occurs in this 
space. [Gro83] 


monogenic wavelets: Wavelet func- 
tions in which complex valued func- 
tions are used rather than negative fre- 
quency components. [OM09] 


monotonicity: A sequence of val- 
ues or a function that is either 
continuously increasing (mono- 
tone increasing) or continuously 
decreasing (monotone decreasing). 
[WP:Monotonic_function] 


Monte Carlo methods: Methods that 
use random samples to obtain approx- 
imate results. For example the integral 
f fC@Op@dx (the expectation value 
of the function f(x) under the density 
DOD) can be approximated using N 
samples x, ..., x“? drawn from p(x) 

1 ` @) i 
as N LF ). [BisO6:Ch. 11] 

Moravec interest point operator: An 
operator that locates interest points 
at pixels where neighboring intensity 


values change greatly in at least one 
direction. These points can be used for 
stereo matching or feature point track- 
ing. The operator computes the sum 
of the squares of pixel differences in 
a line vertically, horizontally and both 
diagonal directions in a 5 x 5 window 
about the given pixel. The minimum of 
these four values is selected and then 
all values that are not local maxima or 
are below a given threshold are sup- 
pressed. The figure shows the interest 
points found by the Moravec operator 
as white dots on the original image: 
[JKS95:14.3] 


morphable models: A model which can 
be adapted to fit data (e.g., a mor- 
phable 3D face model which is altered 
to fit a previously unseen picture of an 
unknown face). [JP98] 


morphing: The process of deforming 
from one image or shape to another 
in a seamless manner through a series 
of intermediate images, transforming 
shape and color: [Sze10:3.6.3] 


fad 


morphological gradient: A gray scale 
mathematical morphology operation 


applied to gray scale images that results 
in an output image similar to the 
standard intensity gradient. The gra- 
dient is calculated by I (D(A, B) — 
Eç(A, B)) where DO and EO 
are the gray scale dilate and erode, 
respectively of image A by kernel B. 
[CS09:4.5.5] 


morphological image processing: 
Processing of images using mathema- 


tical morphology operations, such as 
the erode operator, the dilate operator 
etc. This approach considers images as 
point sets. [Sze10:3.3.2] 


morphological segmentation: Using 


mathematical morphology operations 
applied to binary images to extract iso- 
lated regions of the desired shape. The 
desired shape is specified by the mor- 
phological kernel. The process could 
also be used to separate touching 
objects. [MB90] 


morphological smoothing: A gray 


scale mathematical morphology oper- 
ation applied to gray scale images 
that results in an output image simi- 
lar to that produced by standard noise 
reduction. The smoothing is calculated 
by Cg(Og(A, B), B) where CC) and 
Og()are the gray scale close operation 
and open operation, respectively, of 
image A by kernel B. [SKP96] 


morphological transformation: One 


of a large class of binary image and gray 
scale image transformations whose pri- 
mary characteristic is that they react to 
the pattern of the pixel values rather 
than the values themselves. Examples 
include the dilate operator, the erode 
operator, skeletonization, the thinning 


operator etc. The figure shows (left) 
an image and (right) an application of 
the open operator, when using a disk- 
shaped structuring element 11 pixels 
in diameter: [Jai89:9.9] 


morphology: The shape of a structure. 


See also mathematical morphology 
operation. [Jai89:9.9] 


morphometry: Techniques for the 


measurement of shape. [WP:Morpho- 
metrics] 


mosaic: The construction of a larger 


image from a collection of partially 
overlapping images taken from 
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different viewpoints. The recon- 
structed image could have different 
geometries, e.g., as if seen from a 
single perspective viewpoint or as if 
seen from an orthographic viewpoint. 
See also image mosaic. [Sch89:Ch. 2] 


most probable explanation: Given 
a joint probability distribution over a 
set of random variables defined by a 
probabilistic graphical model, knowl- 
edge of the values of some set of vari- 
ables e (the evidence) gives rise to the 
conditional distribution of the remain- 
ing variables x, i.e., p(x|e). The most 
probable explanation x* is the config- 
uration of x that maximizes p(x|e). See 
also maximum a posteriori probability. 
[Pea88:5.3] 


mother wavelet: The prototype func- 
tion (wavelet) used within a wavelet 
transformation, from which all other 
functions (wavelets) are derived. 
[PTVF92:13.10] 


motion: In the context of computer 
vision, refers to analysis of an image 
sequence where the camera position 
or scene structure changes over time. 
[BB82:Ch. 7] 


motion analysis: Analysis of an image 
sequence in order to extract useful 
information, such as the shape of 
the observed scene, the figure-ground 
separation, the egomotion estimation 
and estimates of a target’s position and 
motion. [BB82:7.2-7.3] 


motion blur: The blurring of an image 


motion capture: The process of captur- 
ing the way in which an object, per- 
son or animal moves through analysis 
of video of the object, person or ani- 
mal. The simplest way to achieve this 
is to attach markers to key points of 
the object, person or animal, although 
markerless motion capture is also pos- 
sible. [WP:Motion_capture] 


motion coding: 1) A component of 
video sequence compression in which 
efficient methods are used to represent 
movement of image regions between 
video frames. 
2) A term for neural cells tuned to 
respond for direction and speeds of 
image motion. [WP:Motion_coding] 


motion compensation: A technique 
used in the encoding of video where 
a picture is described with respect to 
some other (previous or future) pic- 
ture together with motion informa- 
tion. This allows for high levels of 
video compression, as the frame-to- 
frame change is small in most videos. 
[WP:Motion_compensation] 


motion deblurring: The removal of 
blurring effects of either camera or 
scene motion during image capture. 
See also image restoration. [SJA08] 


motion descriptor: A descriptor of the 
motion of an object, point, region 
or feature in a video. [WP:Visual_ 
descriptors] 


motion detection: Analysis of an image 
sequence to determine if or when 


that arises when either the camera or 
something in the scene moves while 


the image is being acquired. The figure 
shows the blurring that occurs when 
an object moves during image capture: 
[WP:Motion_blur] 
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something in the observed scene 
moves. See also change detection. 
[JKS95:14.1] 


motion direction profile: A summary 
of the directions of the motion vectors 
(e.g., optical flow) in a region or image. 


motion discontinuity: When the 
smooth motion of the camera or some- 
thing in the scene changes, such as 
the speed or direction of motion. 
Another form of motion discontinu- 
ity is between two groups of adjacent 
pixels that have different motions. 
[BYJF97] 


motion estimation: Estimating the 
motion direction and speed of the 


motion 


camera or something in the scene. 
[Sch89:Ch. 5] 


motion factorization: Given a set 


of tracked feature points through 
an image sequence, a measurement 
matrix can be constructed. This matrix 
can be factored into component matri- 
ces that represent the shape and 3D 
motion of the structure up to a 3D 
affine transformation (which is remov- 
able using knowledge of the intrinsic 
parameters). [TK92] 


motion feature: Use of local motion 


information as a feature to describe 
object action. These features can be 
used, e.g., to recognize specific human 
activities. [GBS+07] 


motion field: The projection of the 
relative motion vector for each scene 


amounts of parallax) some of which 
might also be moving. [TV98:8.6] 


motion model: A mathematical model 


of types of motion allowable for the 
target object or camera, such as linear 
motion along the optical axis with con- 
stant velocity. Another example might 
allow velocities and accelerations in 
any direction, with occasional discon- 
tinuities, such as for a bouncing ball. 
[BB82:Ch. 7] 


motion moment: A moment computed 


from motion vectors rather than image 
intensities or pixel positions. 


motion parallax: The apparent relative 


displacement of objects when viewed 
from different viewpoints, caused by 
movement of the camera. See also 
parallax. [WP:Parallax] 


point onto the image plane. In 
many circumstances this is closely 
related to the optical flow, but it 
may differ as image intensities can 
also change because of illumination 
changes. Similarly, motion of a uni- 
formly shaded region is not observable 
locally because there is no change in 
image intensity data. [TV98:8.2] 


motion history image (MHD: An 
image which shows the changing pix- 
els from the current frame of a video 
together with aged versions of the 
previously changing pixels. The figure 
shows (left) a video and (right) the MHI 
made from it: [Dav01] 


Original Background Moving Motion History 
Image 


Pixels Image 


history volume: A 
spatio-temporal volume that encodes 
moving pixels over some number of 
frames. [WRBOG6] 


motion layer segmentation: The 
segmentation of an image into dif- 
ferent regions where the motion is 
locally consistent. The layering effect 
is most noticeable when the observer 
is moving through a scene with objects 
at different depths (causing different 


Time 


Motion 


motion representation: See motion 


model. 


motion segmentation: See motion 


layer segmentation. 


motion sequence analysis: The class of 


computer vision algorithms that pro- 
cess sequences of images captured 
close together in space and time, typi- 
cally by a moving camera. These analy- 
ses are often characterized by assump- 
tions on temporal coherence that sim- 
plify computation. [BB82:7.3] 


motion smoothness constraint: The 


assumption that nearby points in the 
image have similar motion directions 
and speeds or similar optical flow. This 
constraint is based on the fact that 
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adjacent pixels generally record data 
from the projection of adjacent surface 
patches from the scene. These scene 
components will have similar motion 
relative to the observer. This assump- 
tion can help reduce motion estima- 
tion errors or constrain the ambiguity 
in optical flow estimates arising from 
the aperture problem. [HL93] 


motion tracking: Identification of the 
same target feature points through an 
image sequence. This could also refer 
to tracking complete objects as well 
as feature points, including estimating 
the trajectory or motion parameters of 
the target. [FP03:Ch. 17] 


motion understanding: Analysis of the 
visual motion in an image to compute 
an understanding of the scene motion 
information. [Tso11] 


movement analysis: A general term for 
analyzing an image sequence of ascene 
where objects are moving. It is often 
used for analysis of human motion, 
such as for people walking or using 
sign language. [BB82:7.2-7.3] 


moving average smoothing: A form of 
noise reduction that occurs over time 
by averaging the most recent images 
together. It is based on the assumption 
that variations in time of the observed 
intensity at a pixel are random. Averag- 
ing the values will thus produce inten- 
sity estimates closer to the true (mean) 
value. [DH73:7.4] 


moving light display: An image 
sequence of a darkened scene contain- 
ing objects with attached point light 
sources. The light sources are observed 
as a set of moving bright spots. This 
sort of image sequence was used in 
the early research on structure from 
motion. [Joh73] 


moving object detection: Analyzing 
an image sequence, usually with a 
stationary camera, to detect whether 
any objects in the scene move. 
[JKS95:14.1] 


moving observer: A camera or other 
sensor that is moving. Moving 
observers have been extensively used 
in research on structure from motion. 
[Nal93:Ch. 8] 
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MPEG: Moving Picture Experts Group. 
A group developing standards for cod- 
ing digital audio and video, as used in 
video CD, DVD and digital television. 
This term is often used to refer to media 
that is stored in the MPEG-1 format. 
[WP:Moving_Picture_Experts_Group] 


MPEG-2: A standard formulated by the 
ISO Motion Pictures Expert Group 
(MPEG), a subset of ISO Recommen- 
dation 13818, meant for transmission 
of studio-quality audio and video. It 
covers four levels of video resolution. 
[WP:MPEG-2] 


MPEG-4: A standard formulated by the 
ISO Motion Pictures Expert Group 
(MPEG), originally concerned with 
similar applications as H.263 (very low 
bit rate channels, up to 64 Kbps). 
Subsequently extended to encompass 
a large set of multimedia applica- 
tions, including over the Internet. 
[WP:MPEG-4] 


MPEG-7: A standard formulated by the 
ISO Motion Pictures Expert Group 
(MPEG). Unlike MPEG-2 and MPEG-4, 
which deal with compressing multi- 
media contents within specific appli- 
cations, it specifies the structure and 
features of the compressed multime- 
dia content produced by the different 
standards, e.g., to be used in search 
engines. [WP:MPEG-7] 


MPEG-7 descriptor: Part of the MPEG-7 
video encoding standard ISO/IEC 
15938. The MPEG-7 descriptors 
describe low level video and audio 
features such as color, texture, motion 
or shape. [WP:MPEG-7] 


MRF: See Markov random field. 


MRI: Magnetic Resonance 
See nuclear magnetic 
[FP03:18.6] 


MSRE: Mean Squared Reconstruction 
Error. [Xu93] 


Imaging. 
resonance. 


MTF: See modulation transfer function. 


multibaseline stereo: Use of more 
than two cameras for binocular stereo 
depth calculations resulting in multi- 
ple baselines between pairs of cam- 
eras. It is sometimes used to make 
the correspondence problem simpler 


to solve by reducing the length of the 
baseline between camera pairs thereby 
effectively providing a wide baseline 
between the more distant cameras. 
[Sze10:11.6] 


multi-camera behavior correlation: 
The temporal and spatial correlation of 
activities or behaviors in multiple cam- 
era views. [LGX09] 


multi-camera distributed behavior: 
The behavior of objects as viewed 
by multiple cameras. These behaviors 
must be correlated in the temporal as 
well as the spatial domain. [LGX09] 


multi-camera system: Any system with 
multiple cameras providing observa- 
tions. 


multi-camera-system blind area: An 
area of a multi-camera system which 
is not observable, possibly caused by 
occlusion from objects within the 
scene as well as simply having no 
observing camera. [CKRH04] 


multi-camera topology: The layout of 
cameras within an environment in 
a multi-camera system. Strictly, the 
topology can be expressed as a graph, 
where arcs indicate overlapping or 
linked views (e.g., by roads). Common 
use may also include geometric rela- 
tions between cameras. Typically, this 
will include not just the positions of 
the cameras but also their orientations 
and fields of view. It is often shown in 
two dimensions, e.g., on a plan view. 
[EMB03] 


multichannel kernel: A kernel or 
matrix of weights, one for each pixel, 
which is convolved with multiple 
channels rather than the normal single- 
channel convolution operator. 


multidimensional edge detection: A 
variation on standard edge detection 
of gray scale images in which the input 
is a multi-spectral image (e.g., an RGB 
color image). The edge detection oper- 
ator may detect edges and combine 
the edges in each dimension indepen- 
dently or may use all information at 
each pixel directly. The figure shows 
edges detected from the red, green and 
blue components of an RGB image: 
[MR81] 
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multidimensional histogram: A 


histogram with more than one 
dimension. For example consider 
measurements as vectors, e.g., from 
a multi-spectral image, with N dimen- 
sions in the vector. Then one could 
create a histogram represented by an 
array with dimension N. The N compo- 
nents in each vector are used to index 
into the array. Accumulating counts 
or other evidence values in the array 
makes it a histogram. [BB82:5.3.1] 


multidimensional scaling (MDS): 
Scaling that starts with a (possi- 
bly incomplete) matrix of pairwise 
dissimilarities between observa- 
tions and returns a set of points in 
low-dimensional space that reflects 
the dissimilarities. Classical scaling 
treats the dissimilarities as Euclidean 
distances and obtains the output 
configuration by solving a particular 
eigen-decomposition. [MKB79:Ch. 14] 


multifilter bank: An abbreviation for 
“multiwavelet filter” bank. An array 
of wavelet filters with more than one 
scaling function which separates the 
input signal into multiple components. 
[LP11] 


multifocal tensor: A geometric descrip- 
tion of the linear relationship between 
four or more camera views. For two 
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multilayer 


and three cameras, the tensors are 
called the bifocal tensor and the 
trifocal tensor, respectively. [TM99b] 


multifocus images: Images fused from 
different images at different focus lev- 
els of a scene, creating images which 
are in focus for objects at very different 
depths: [HM98] 


Clean 
Diesel 
Litre 


Clean 


Litre 


multigrid method: An efficient algo- 
rithm for solving systems of discretized 
differential (or other) equations. The 
term “multigrid” is used because the 
system is first solved at a coarse sam- 
pling level, which is then used to 
initialize a higher-resolution solution. 
[WP:Multigrid_method] 


multi-image registration: A general 
term for the geometric alignment of 
two or more image datasets. Alignment 
allows pixels from the different source 
images to lie on top of each other 
or to be combined. See also sensor 
fusion. For example, two overlapping 
intensity images could be registered to 
help create a mosaic. Alternatively, the 
images could be from different types 
of sensor (see multimodal fusion). For 
example, nuclear magnetic resonance 
and computed axial tomography 
images of the same body part could 
be registered to provide richer infor- 
mation to a doctor. The figure shows 
deft) two unregistered range images 
and (right) the registered datasets: 
[FP03:21.3] 
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perceptron network 
(MLP): A form of neural network used 
for supervised learning problems. It 
maps input data x of dimension d to 
a space of outputs of dimension d’. 
A multilayer perceptron network is 
a cascade of single-layer perceptron 
networks, with different weights 
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matrices at each layer. The layers of 
units which are neither input nor 
output units are known as hidden 
units; the number of hidden units in 
each layer is arbitrary. Typically, an 
MLP network is trained to predict 
the relationship between the xs and 
ys for a given collection of training 
examples by minimizing the error 
between the predicted and actual ys 
on the training set. The derivatives of 
this error with respect to the weights 
in the network allow minimization 
using the back-propagation algorithm. 
[Bis06:5.1] 


multilevel method: See multi-scale 


method. 


multilinear constraint: A generic term 


for the geometric constraint on mul- 
tiple views of a point. See epipolar 
constraint, trilinear constraint and 
quadrilinear constraint for two, three 
and four constraints respectively. 
[FP03:Ch. 10]. 


multilinear method: Multilinear alge- 


bra generalizes linear algebra to 
handle d-way arrays or tensors, as 
opposed to vectors (first-order ten- 
sors) and matrices (second-order ten- 
sors). Analogously to the singular value 
decomposition (SVD) for matrices, 
there is a d-mode SVD for tensors. An 
application in computer vision is to 
“tensor faces”, where face images are 
analyzed for factors of variation includ- 
ing person identity, facial expression, 
viewpoint and illumination. [VTO2] 


multimodal analysis: A general term 


for image analysis using image data 
from more than one sensor type. There 
is often the assumption that the data is 
registered so that each pixel records 
data of two or more types from the 
same portion of the observed scene. 
[WP:Computer_Audition#Multi- 
modal_analysis] 


multimodal fusion: See sensor fusion. 


[WP:Multisensory Integration] 


multimodal image alignment: The 


alignment of images acquired by sen- 
sors of different modalities, e.g., visible 
light and infrared light. Because differ- 
ent modalities generally have different 


responses, alignment is generally per- 
formed by aligning common features 
rather than using intensity directly. 
[VW97] 


multimodal neighborhood signa- 
ture: A description of a feature point 
based on the image data in its neigh- 
borhood. The data comes from several 
registered sensors, such as X-ray and 
NMR. [MKKO2] 


multi-object behavior: The behavior 
(actions or movements) of multiple 
objects within a scene. [HTWM04] 


multi-ocular stereo: A stereo 
triangulation process that uses more 
than one camera to infer 3D infor- 
mation. The terms “binocular stereo” 
and “trinocular stereo” are commonly 
used when there are only two or three 
cameras, respectively. [XLO7] 


multi-resolution method: See 


multi-scale method. 


multi-scale description: See multi-scale 
method. 


multi-scale integration: 1) Combin- 
ing information extracted by using 
operators with different scales. 
2) Combining information extracted 
from the registration of images with 
different scales. These two definitions 
could just be two ways of consid- 
ering the same process if the dif- 
ference in operator scale is only a 
matter of the amount of smoothing. 
An example of multi-scale integra- 
tion occurs when combining edges 
extracted from images with differ- 
ent amounts of smoothing to produce 
more reliable edges. [SNS+98] 


multi-scale method: A general term 
for a process that uses information 
obtained from more than one scale 
of image. The different scales might 
be obtained by reducing the image 
size or by Gaussian smoothing of 
the image. Both methods reduce the 
spatial frequency of the information. 

The main reasons for using multi-scale 

methods are: 

e some structures have different nat- 
ural scales (e.g., a thick bar could 
also be considered to be two back- 
to-back edges) 


e coarse scale information is gener- 
ally more reliable in the presence of 
image noise, but the spatial accuracy 
is better in finer scale information. 

Edge detection might use a coarse scale 

to reliably detect the edges and a finer 

scale to locate them more accurately. 

The figure shows an image with two 

scales of blurring: [Wit84] 


multi-scale representation: A 
representation having image features 
or descriptions that belong to two or 
more scales. An example might be 
zero crossings detected from intensity 
images that have received increasing 
amounts of Gaussian smoothing. A 
multi-scale model might represent an 
arm as a single generalized cylinder at a 
coarse scale, two generalized cylinders 
at an intermediate scale and with sur- 
face triangulation at a fine scale. The 
representation might have results from 
several discrete scales or from a more 
continuous range of scales, as in a scale 
space. The figure shows zero crossings 
found at two scales of Gaussian blur- 
ring: [WP:Scale_space#Related_multi- 
scale_representations] 


multi-sensor geometry: The relative 
placement of a set of sensors or mul- 
tiple views from a single sensor but 
from different positions. One key con- 
sequence of the different placements is 
the ability to deduce the 3D structure 
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of the scene. The sensors need not be 
the same type but usually are, for con- 
venience. [FP03:11.4] 


multi-spectral analysis: Using the 
observed image brightness at differ- 
ent wavelengths to aid in the under- 
standing of the observed pixels. A 
simple version uses RGB image data. 
Seven or more bands, including several 
infrared light wavelengths are often 
used for satellite remote sensing anal- 
ysis. Recent hyperspectral sensors can 
give measurements at 100-200 differ- 
ent wavelengths. [SQ04:17.1] 


multi-spectral image: An image con- 
taining data measured at more than 
one wavelength. The number of wave- 
lengths may be as low as two (in some 
medical scanners) or three (e.g., RGB 
image data), or as high as seven or more 
bands, including several infrared light 
wavelengths (e.g., satellite remote 
sensing). Recent hyperspectral sensors 
can give measurements at 100-200 
different wavelengths. The typical 
registration uses a vector to record 
the different spectral measurements at 
each pixel of an image array. The figure 
shows the red, green and blue compo- 
nents of an RGB image: [Umb98:1.7.4] 
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multi-spectral segmentation: Segmen- 
tation of a multi-spectral image. This 
can be addressed by segmenting the 
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image channels individually and then 
combining the results; alternatively 
the segmentation can be based on 
some combination of the information 
from the channels. [WP:Multispectral_ 
segmentation] 


multi-spectral thresholding: A seg- 
mentation technique for multi-spectral 
image data. A common approach is 
thresholding each spectral channel 
independently and then logically 
ANDing together the resulting images. 
An alternative is to cluster pixels in 
a multispectral space and choose 
thresholds that select desired clusters. 
The figure shows (eft) a colored 
image, (middle) the image thresholded 
in the blue channel (0-100 accepted) 
and (right) the image ANDed with 
the thresholded green channel (0-100 
accepted): [SHB08:5.1.3] 
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multi-tap camera: A camera that pro- 
vides multiple outputs. [HCLO7] 


multi-thresholding: Thresholding 


using a number of thresholds giving a 
result that has a number of gray scales 
or colors. The figure shows an image 
with two thresholds (113 and 200): 
[KOA92] 


multi-variate normal distribution: A 
Gaussian distribution for a variable that 
is a vector rather than a scalar. Let 
x be the vector variable with dimen- 
sion N. Assume that this variable has 
mean value ji, and covariance matrix 
C. Then the probability of observing 
the particular value x is given by: 
[SB11:11.11] 
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multi-view geometry: See multi-sensor 
geometry. 


multi-view image registration: See 
multi-image registration. 


multi-view stereo: See multi-sensor 
geometry. 


multiple instance learning: A variant 
on supervised learning. Rather than 
receiving a set of input-output pairs, 
the learner receives a set of bags 
of instances. Each bag contains many 
inputs, but only one label. The label 
is negative if all inputs in the bag 
are negative and positive otherwise. 
[Mur12:1.5.4] 


multiple instruction multiple data 
(MIMD): A form of parallelism in 
which, at any given time, each pro- 
cessor might be executing a differ- 
ent instruction or program on a dif- 
ferent dataset or pixel. This contrasts 
with single instruction multiple data 
parallelism where all processors exe- 
cute the same instruction simultane- 
ously although on different pixels. 
[Sch89:Ch. 8] 


multiple kernel learning: Given a num- 
ber of kernel functions that are suited 
to a problem, finding an optimal linear 
combination of these kernels. [BLJO4] 


multiple light source detection: The 
location of multiple light sources 
within a scene. Much image analysis 
assumes a point light source for sim- 
plicity. [ZK02] 


multiple motion segmentation: See 
motion segmentation. 


multiple target tracking: A general 
term for tracking multiple objects 
simultaneously in an image sequence. 
Example applications include tracking 
football players or automobiles on a 
road. [HCP02] 


multiple view interpolation: A tech- 
nique for creating (or recognizing) 
new unobserved views of a scene from 
sample images captured from other 
viewpoints. [BE92] 


multiplexed illumination: The use of 
multiple light sources separated by 
color or time to illuminate a scene so 


that images of the scene under each 
single light source can be obtained 
by demultiplexing. Images taken with 
multiple light sources are brighter 
(and hence higher quality) than those 
acquired using separate sources. 
[WGT+05] 


multiplicative noise: A model for the 
corruption of a signal where the noise 
is proportional to the signal strength. 
SO, VY) = BX, V + BO, Vv, y) 
where f(x, y) is the observed signal, 
g(x, y) is the ideal (original) signal and 
u(x, y) is the noise. [FSSH82] 


multi-resolution image analysis: 
The processing of images at multiple 
resolutions simultaneously. This type 
of processing may be used, e.g., to 
consider objects at multiple scales 
or to identify features at particular 
scales (e.g., SIFT). See also Gaussian 
pyramid, Laplacian pyramid and 
wavelet transform. [Sze10:3.5.3, 3.5.4] 


multi-sensor alignment: The align- 
ment of images acquired by different 
sensors typically of different modal- 
ities, e.g., visible light and infrared 
light). See also multimodal image 
alignment. [IA98] 


multi-sensory fusion: The combina- 
tion of information from multiple dif- 
ferent sensors. This problem can be 
very difficult as the information must 
be aligned and may be very different in 
form or content. See also sensor fusion. 
[WP:Image_fusion] 


multi-view activity representation: A 
representation of activities across mul- 
tiple views in a multi-camera system. 


Munsell color notation system: A sys- 
tem for precisely specifying colors and 
their relationships, based on hue, value 
(brightness) and chroma (saturation). 
The Munsell Book of Color contains 
colored chips indexed by these three 
attributes. The color of any unknown 
surface can be identified by compari- 
son with the colors in the book under 
specified lighting and viewing condi- 
tions. [GM03:5.3.6] 

mutual illumination: When light 


reflecting from one surface illumi- 
nates another surface and vice versa. 
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The consequence of this is that light 
observed coming from a surface is a 
function of not only the light source 
spectrum and the reflectance of the tar- 
get surface, but also the reflectance of 
the nearby surface (through the spec- 
trum of the light reflecting from the 
nearby surface onto the first surface). 
The figure shows how mutual illumina- 
tion can occur: [FZ89] 
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mutual information: The amount of 


information two random variables 
have in common. The mutual informa- 
tion I(X; Y) is defined as I(X;Y) = 
H(X)+ HWY) — HX, Y = HX- 
ACX|Y) = HY) — ACY|X, where 
H() denotes entropy. Notice that 
I(X; Y) is symmetric. [CT91:Ch. 2] 


mutual interreflection: See mutual 


illumination. 


naive Bayes classifier: A Bayesian 


a Voronoi segmented image to provide 


classifier which uses an indepen- 
dence assumption to model the class- 
conditional distribution. If x denotes 
the d-dimensional feature vector and 
c the class label, then the naive 
Bayes assumption is that p(x|c) = 
Tt, Plo. [Bis06:p. 380] 


NAND operator: An image arithmetic 
operation where a new image is 
formed by NANDing (logical AND fol- 
lowed by NOT) corresponding bits for 
every pixel of the two images. This 
operator is most appropriate for binary 
images but may also be applied to gray 
scale images. The figure shows the 
NAND operator applied to two binary 
images: [SB11:3.2.2] 
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narrow baseline stereo: A form of 
stereo triangulation in which the 
sensor positions are close together. 
The baseline is the distance between 
the sensor positions. Narrow baseline 
stereo often occurs when the image 
data is from a video sequence taken 
by a moving camera. [TNMO09] 


natural image statistics: The statistics 
or statistical structure of images of 
the natural world. Often used in the 
computational modeling of biological 
vision systems. [SO01] 


natural material recognition: The 
recognition of materials which occur 
naturally (as opposed to man-made 
materials). [LMO1] 


natural neighbor interpolation: A 
form of spatial interpolation employing 


weighted values for the interpolation. 
[Cre93:pp. 374-376] 


near infrared: Light wavelengths ap- 
proximately in the range 750-5000 
nm. [WP:Infrared] 


near light source: A light source close 
to the illuminated object such that the 
rays of light are still effectively expand- 
ing from a point: 


* 


Far light source 


Near light source 


See also far light source. 


near duplicate image/video retrieval: 
The identification of images or videos 
that are almost identical to a known 
image or video, perhaps after some 
transformations, such as compression, 
resizing or cropping. [SZH+07] 


nearest neighbor: A classification, label- 
ing or grouping principle in which a 
data item is associated with or takes 
the same label as the previously clas- 
sified data item that is nearest to the 
first data item. The “nearness” might 
be based on spatial distance or a dis- 
tance in a property space. In the figure, 
the unknown square is classified with 
the label of the nearest point, a circle: 
(JKS95:15.5.1] 


Dictionary of Computer Vision and Image Processing, Second Edition. 
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Necker cube: A line drawing of a 
cube under orthographic projection, 
which can be interpreted in two ways: 
[Nal93:Ch. 4] 
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Necker reversal: An ambiguity in the 
recovery of 3D structure from multi- 
ple images. Under affine viewing con- 
ditions, the sequence of 2D images of 
a set of rotating 3D points is the same 
as the sequence produced by rotation 
in the opposite direction of a different 
set of points, so that two solutions to 
the structure and motion problem are 
possible. The different set of points is 
the reflection of the first set about any 
plane perpendicular to the optical axis 
of the camera. [HZ00:13.6] 


needle map: An image representation 
used for displaying 2D and 3D vector 
fields, such as surface normals. Each 
pixel has a vector. Diagrams showing 
them use little lines with the magni- 
tude and direction of the vector pro- 
jected onto the image of a 3D vec- 
tor. To avoid overcrowding the image, 
the pixels where the lines are drawn 
are a subset of the full image. The 
figure shows a needle map of the 
surface normals on the block sides: 
[Hor86:11.8] 


negate operator: See invert operator. 


neighborhood: 1) The neighborhood of 


a vertex v in a graph is the set of ver- 
tices that are connected to v by an arc. 
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2) The neighborhood of a point (or 
pixel) x is a set of points “near” x. A 
common definition is the set of points 
within a certain distance of x, where 
the metric may be Manhattan distance 
or Euclidean distance. 

3) The 4-connected neighborhood of 
a 2D location (x, y) is the set of image 
locations +1, y, a- 1, y), x, 
y+ 1), @œ,y— 1)}. The 8-connected 
neighborhood is the set of pixels 
{æ+ iy+ D1 <i, j <1}. The 
26-connected neighborhood of a 3D 
point (x, y, Ð is defined analogously: 
[SQ04:4.5] 
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neural network: Artificial neural net- 


works (ANNs) are composed of units 
(artificial neurons) connected into net- 
works. The connections usually have 
adjustable strength, known as weights. 
The adjustment of the weights by 
learning algorithms (or training algo- 
rithms) allows neural networks to per- 
form pattern recognition tasks such as 
supervised learning and unsupervised 
learning. ANNs can be thought of as 
simplified or abstracted models of bio- 
logical networks, although a lot of 
work is solely concerned with the pat- 
tern recognition properties of these 
nonlinear models. ANNs can have dif- 
ferent architectures, such as a feed- 
forward layout from input to output 
via intermediate hidden layers (as used 
in the multilayer perceptron network 
and radial basis function network), 
or a recurrent architecture involving 
feedback (as used in the Hopfield 
network and the Kohonen network). 
[BisOG6:Ch. 5] 


neutral expression: A blank or non- 


emotional expression on a person’s 
face. [WP:Facial_expression] 


Newton optimization: To find a local 


minimum of function f :R”— R 
from starting position X). Given the 


function’s gradient V f and Hessian H 
evaluated at Xz, the Newton update is 
Xe+1 = Xe — HV f. If f is a quadratic 
form then a single Newton step will 
directly yield the global minimum. For 
general f, repeated Newton steps will 
generally converge to a local optimum. 
[FP03:3.1.2] 


next view planning: When inspecting 
an object or obtaining a geometric 


noise characteristics: See noise model. 


noise model: A way to model the sta- 


tistical properties of noise without 
having to model the causes of the 
noise. One general assumption about 
noise is that it has some underlying, 
but perhaps unknown, distribution. A 
Gaussian noise model is commonly 
used for random factors and a uniform 
distribution is often used for unmod- 


model or appearance-based model, it 
may be necessary to observe the object 
from several places. Next view plan- 
ning determines where to place the 
camera next (by moving either the 
object or the camera) based on what 
was observed (in the case of unknown 
objects) or a geometric model (in the 
case of known objects). [RSB04] 


next view prediction: See next view 
planning. 


NMR: See nuclear magnetic resonance. 


node of graph: A symbolic represen- 
tation of some entity or feature. It 
is connected to other nodes in a 
graph by arcs, that represent relation- 
ships between the different entities. 
[SQ04:12.1] 


noise: A general term for the deviation 
of a signal away from its “true” value. 
In the case of images, this leads to 
pixel values (or other measurements) 
that are different from their expected 
values. The causes of noise can be 
random factors, such as thermal noise 
in the sensor, or minor scene events, 
such as dust or smoke. Noise can 
also represent systematic, but unmod- 
eled, events such as short-term lighting 
variations or quantization. Noise might 
be reduced or removed using a 
noise reduction method. The fig- 
ure shows images without and with 
salt-and-pepper noise: [TV98:3.1] 


eled scene effects. Noise could be mod- 
eled with a mixture model. The noise 
model typically has one or more param- 
eters that control the magnitude of the 
noise. The noise model can also spec- 
ify how the noise affects the signal, e.g. 
additive noise offsets the true value and 
multiplicative noise rescales the true 
value. The type of noise model can 
constrain the noise reduction method. 
[Jai89:8.2] 


noise reduction: An image-processing 


method that tries to reduce the dis- 
tortion of an image that has been 
caused by noise. For example, the 
images from a video sequence taken 
with a stationary camera and scene 
can be averaged together to reduce 
the effect of Gaussian noise because 
the average value of a signal corrupted 
with this type of noise converges to 
the true value. Noise-reduction meth- 
ods often introduce other distortions, 
but these may be less significant to 
the application than the original noise. 
The figure shows (left) an image with 
salt-and-pepper noise and (right) its 


noise reduced by median smoothing: 
[TV98:3.2] 


noise removal: See noise reduction. 


noise source: A general term for phe- 


nomena that corrupt image data. It 
could be systematic unmodeled pro- 
cesses (e.g., 60 Hz electromagnetic 
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noise) or random processes (e.g., elec- 
tronic shot noise). The sources could 
be in the scene (e.g., chaff), in the 
medium (e.g., dust), in the lens (e.g., 
imperfections) or in the sensor (e.g., 
sensitivity variations). [WP:Noise] 


noise suppression: See noise 
reduction. 


noise-whitening filter: A noise- 
modifying filter that outputs images 
whose pixels have noise that is 
independent of other pixels’ noise 
(spatial noise) or values of that pixel 
at other times (temporal noise). The 
resulting image’s noise is white noise. 
[Jai89:6.2] 


noiselets: Noise-like functions that are 
complementary to wavelets. They 
result in very poor or no compression 
for orthogonal wavelet compression 
functions. [WP:Noiselet] 


non-accidentalness: A general princi- 
ple that can be used to improve image 
interpretation based on the concept 
that when regularities appear in an 
image, they are most likely to result 
from regularities in the scene. For 
example, if two straight lines end near 
each other, then this could have arisen 
from a coincidental alignment of the 
line ends and the observer. However, 
it is much more probable that the two 
lines end at the same point in the 
observed scene. The figure shows line 
terminations and orientations that are 
unlikely to be coincidental: [OCEG04] 


NON-ACCIDENTAL TERMINATION 


NON-ACCIDENTAL 
PARALLELISM 


non-affective gesture: Human gestures, 
such as sign language, which are not 
emotional. [WP:Gesture] 


non-central camera: A camera which 
cannot be modeled by central 
projection. Such cameras have been 
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developed for applications such as 
panoramic imaging. [MP04] 


non-convexity: A description of a func- 


tion, object, set or error-space which is 
not convex. This can lead local minima 
during optimization. [BV04:2.1.4, 3.1] 


non-hierarchical control: A way of 


structuring the sequence of actions 
in an image interpretation system: 
there is no master process that orders 
the sequence of actions or operators 
applied. Instead, typically, each opera- 
tor can observe the current results and 
decide if it is capable of executing and 
if it is desirable to do so. [SHB08:8.1.6] 


non-Lambertian reflection, non- 
Lambertian surface: Light which is 
not reflected evenly in all directions 
from a surface is non-Lambertian. 
Many objects are non-Lambertian (e.g., 
those exhibiting specular reflections): 


See also Lambertian surface. 


[Sze10:2.2.2] 


nonlinear diffusion: A diffusion func- 


tion which is dependent on image posi- 
tion. Nonlinear diffusion functions are 
based on partial differential equations 
and are used, e.g., in image processing 
for denoising images. [PGPO94] 


nonlinear filter: A process where the 


outputs are a nonlinear function of the 

inputs. This covers a large range of 

algorithms. Examples of nonlinearity 
include: 

e doubling the values of all input 
data does not double the values of 
the output results (e.g., a filter that 
reports the position at which a given 
value appears); 

e applying an operator to the sum of 
two images gives different results 
from adding the results of the oper- 
ator to the two original images (e.g., 
thresholding). 

[Jai89:8.5] 


non-local patches: Non-neighboring 
patches or image regions that can 
be used in image processing to 
address restoration, segmentation etc. 
[BCM05] 


non-maximal suppression: A tech- 
nique for suppressing multiple 
responses (e.g., high values of gradient 
magnitude) representing a single 
edge or other feature. The resulting 
edges should be a single pixel wide. 
[JKS95:5.6.1] 


non-negative matrix factorization 
(NMF): Given a data matrix X contain- 
ing non-negative entries, X is decom- 
posed as the product WH of two 
lower rank non-negative matrices W 
and H. This can be interpreted as 
decomposing a data vector (a column 
of X) as a weighted combination of 
basis functions, where H contains the 
basis functions and the correspond- 
ing column of W gives the weights. 
Contrast with the singular value 
decomposition (SVD) of X, which does 
not impose non-negativity constraints. 
[HTF08:14.6] 


non-overlapping field of view: Cam- 
eras which do not view any common 
area: [PD03] 


Field of view 


Cameras 


Field of view 


non-parametric clustering: A data 
clustering process (such as the mean 
shift and support vector clustering 
methods) that uses peaks in the 
observed data distribution rather than 


assuming an underlying probability dis- 
tribution. [MMK04] 


non-parametric method: A statistical 
model that allows the number of 
parameters to grow with the amount of 
data. This has the advantage of making 
fewer assumptions about the underly- 
ing distribution. The Parzen window 
and the k-nearest-neighbor algorithm 
are examples of non-parametric meth- 
ods. [Mur12:1.2.11] 


non-rigid model representation: A 
model representation where the shape 
of the model can change, perhaps 
under the control of a few parameters. 
These models are useful for represent- 
ing objects whose shape can change, 
such as moving humans or biological 
specimens. The differences in shape 
may occur over time or between dif- 
ferent instances. Changes in appar- 
ent shape because of perspective 
projection and observer viewpoint are 
not relevant here. By contrast, a rigid 
model would have the same actual 
shape irrespective of the viewpoint of 
the observer. [BY95] 


non-rigid motion: A motion of an 
object in the scene in which the 
shape of the object also changes. Exam- 
ples include the position of a walk- 
ing person’s limbs and the shape of 
a beating heart. Changes in appar- 
ent shape because of perspective 
projection and viewpoint are not rel- 
evant here. [KG92] 


non-rigid registration: The problem 
of registering, or aligning, two shapes 
that can take on a variety of con- 
figurations (unlike rigid shapes). For 
instance, a walking person, a fish, and 
facial features like mouth and eyes 
are all non-rigid objects, the shape 
of which changes in time. This type 
of registration is frequently needed in 
medical imaging as many human body 
parts deform. Non-rigid registration 
is considerably more complex than 
rigid registration. See also alignment, 
registration and rigid registration. 
[CRO3] 


non-rigid structure from motion: The 
recovery of the structure of non-rigid 
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3D objects from a sequence of possibly 
uncalibrated 2D images where the non- 
rigid objects are moving. [THB08] 


non-rigid tracking: A tracking process 
that is designed to track non-rigid 
objects. This means that it can cope 
with changes in actual object shape 
as well as apparent shape because of 
perspective projection and observer 
viewpoint. [CRM00] 


non-symbolic representation: A 
model representation in which the 
appearance is described by a numerical 
or image-based description rather than 
a symbolic or mathematical descrip- 
tion. For example, a non-symbolic 
model of a line would be a list of 
the coordinates of the points in the 
line or an image of the line. Compare 
with symbolic object representation. 
[FM98] 


non-uniform illumination: Lighting 
which is uneven, resulting in dif- 
ferent amounts of light falling on 
different parts of the scene. This 
type of illumination makes tasks like 
thresholding more difficult: [KY99] 


nonverbal communication: Commu- 
nication between people without 
words, e.g., through body language, 
facial expression or sign language. 
[WP:Nonverbal_communication] 


normal curvature: A plane that con- 
tains the surface normal 7 at point P to 
a surface intersects that surface to form 
a planar curve I that passes through P. 
The normal curvature is the curvature 
ofT at P. The intersecting plane can be 
at any specified orientation about the 
surface normal: [JKS95:13.3.2] 
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normal distribution: See Gaussian 


distribution. 


normal flow: The component of optical 


flow in the direction of the intensity 
gradient. The orthogonal component 
is not locally observable because small 
motions orthogonally do not change 
the appearance of local neighbor- 
hoods. [CHHN98] 


normalized compression distance 


(NCD): The normalized compression 
distance between any two objects x 
and y is given by 
C(xy)—min(C (x), CY) 
max(C (x), COQ) 
where C(x) is the compressed length 
of x and xy denotes the concatenation 


of x and y. It can be used as a basis for 
clustering. [CVO5b] 


NCD@, vy) = 


normalized correlation: See 


correlation. 


normalized discounted cumulative 


gain: In an image retrieval situation, 
the result of a query is a ranked list of 
images. To evaluate this result, assume 
that each image has been annotated 
with a relevance score. The discounted 
cumulative gain (DCG) measures the 
quality of the result, penalizing highly 
relevant images which appear low 
down the ranked list. The normaliza- 
tion computes the ratio of the observed 
DCG to the best DCG that could be 
obtained. [MRS08:8.4] 


NOT operator: See invert operator. 


novel view synthesis: A process 


whereby a new view of an object 
is synthesized by combining infor- 
mation from several images of the 
object from different viewpoints. One 
method is by 3D reconstruction, e.g., 
from binocular stereo, and then ren- 
dering the reconstruction using com- 
puter graphics. However, the main 
approaches to novel view synthesis 
use epipolar geometry and the pixels 
of two or more images of the object 
to directly synthesize a new image 
without creating a 3D reconstruction. 
[AS97] 


NP-complete: A concept in computa- 


tional complexity covering a special 
set of problems. All of these problems 
currently can be solved, in the worst 
case, in time exponential O(e™) in the 
number or size N of their input data. 
For the subset of exponential prob- 
lems called NP-complete, if an algo- 
rithm for one could be found that exe- 
cutes in polynomial time O(N?) for 
some p, then a related algorithm could 
be found for any other NP-complete 
algorithm. [SQ04:12.5] 


NTSC: National Television System Com- 


mittee. A television signal recording 
system used for encoding video data at 
approximately 60 video fields per sec- 
ond. Used in the USA, Japan and other 
countries. [Jai89:4.1] 


nuclear magnetic resonance (NMR): 
An imaging technique based on mag- 
netic properties of the atomic nuclei. 
Protons and neutrons within atomic 
nuclei generate a magnetic dipole that 
can respond to an external magnetic 
field. Several properties related to the 
relaxation of that magnetic dipole give 
rise to values that depend on the tis- 
sue type, thus allowing identification 
or at least visualization of the different 


soft tissue types. The measurement 
of the signal is a way of measuring 
the density of certain types of atoms, 
such as hydrogen in the case of bio- 
logical NMR scanners. This technol- 
ogy is used for medical body scan- 
ning, where a detailed 3D volumetric 
image can be produced. Signal lev- 
els are highly correlated with different 
biological structures so one can easily 
observe different tissues and their posi- 
tions. Also called magnetic resonance 
imaging (MRD. [FP03:18.6] 


nuclear norm: A metric on the 
eigenvalues of a matrix, rather than 
on the entries. The norm of mx 
n matrix A with eigenvalues o; is 
defined as || A ||,= trace(/A*A) = 
yen” o, This norm is invariant to 
unitary transformations of A, e.g., || 
A ||,=|| UAV ||, where U and V are 
unitary matrices. The norm is often 
used in optimization or dimensionality 
reduction problems where there is a 
constraint on the rank of a matrix. 
[JS10] 


number plate recognition: See license 
plate recognition. 


NURBS: Non-uniform rational b-splines: 
a type of shape modeling primitive 
based on ratios of b-splines. Capa- 
ble of accurately representing a wide 
range of geometric shapes including 
freeform surfaces. [WP:Non-uniform_ 
rational_B-spline] 


Nyquist frequency: The minimum 
sampling frequency for which the 
underlying true image (or signal) can 
be reconstructed from the samples. If 
sampling at a lower frequency, then 
aliasing will occur, creating apparent 
structure that does not exist in the orig- 
inal image. [SB11:2.3.2.1] 


Nyquist sampling rate: See Nyquist 
frequency. 
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object: 1) A general term referring to 
a group of features in a scene that 
humans consider to compose a larger 
structure. In vision, it is generally 
thought of as that to which attention 
is directed. 

2) A general system theory term, where 
the object is what is of interest (unlike 
the background). Resolution or scale 
may determine what is considered the 
object. [Low91:p. 236] 


object-based representation: A repre- 
sentation that specifies the classes and 
poses of objects. This type of represen- 
tation facilitates reasoning about the 
objects in a scene. [DM98] 


object-centered representation: A 


model representation in which the 
position of the features and compo- 
nents of the model are described rel- 
ative to the position of the object 
itself. This might be a relative descrip- 
tion (the nose is 4 cm from the 
mouth) or might use a local coordinate 
system (e.g., the right eye is at posi- 
tion (0,25,10) where (0,0,0) is the 
nose.) This contrasts with, e.g., a 
viewer-centered representation. The 
figure shows a rectangular solid 
defined in its local coordinate system: 
[JKS95:15.3.2] 


(L,H,W) 


(or type) of an unknown object, as 
contrasted with identifying the specific 
type or even individual (e.g., identify- 
ing an unknown object as a chair rather 
than identifying it as a specific type of 
chair). [Sze10:14.4] 


object contour: See occluding contour. 


object detection: The discovery of 
objects within a scene or image. 
[Sze10:14.1] 


object grouping: A general term mean- 
ing the clustering of all the image data 
associated with a distinct observed 
object. For example, when observing a 
person, the object grouping could clus- 
ter all of the pixels from the image of 
the person. [FP03:24.1] 


object indexing: A method of orga- 
nizing object models based on some 
(probably shape-based) primitive in 
order to allow similar models to be 
located efficiently. See also indexing 
and model base indexing. [RB95] 


object localization: The process of 
determining the position of an object 
in a scene or image. [LBH08] 


object plane: In the case of convex 
simple lenses, typically used in labo- 
ratory TV cameras, the object plane is 
the 3D scene plane where all points 
are exactly in focus on the image plane 
(assuming a perfect lens and the optical 
axis perpendicular to the image plane): 
[WP:Microscopy#Oblique_illumination] 


LENS 


OPTICAL AXIS 


IMAGE PLANE OBJECT PLANE 


object recognition: A general term for 


object class recognition (categoriza- identifying which of several (or many) 
tion): The identification of the class possible objects is observed in an 
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image. The process may also include 
computing the object’s image or scene 
position, or labeling the image pixels 
or image features that belong to the 
object. [FP03:21.4] 


object representation: An encoding 


of an object into a form suitable 
for computer manipulation. The mod- 
els could be, e.g., geometric models, 
graph models or appearance models. 
[JKS95:15.3] 


object segmentation: The separa- 


tion of objects within a scene 
or image. Typically addressed using 
either region-based segmentation or 
edge-based segmentation. Compare 
with image segmentation. [Sze10: 
Ch. 5] 


object verification: A component of 


an object recognition process that 
attempts to verify a hypothesized 
object identity by examining evidence. 
Commonly, geometric models are used 
to verify that object features are 
observed in the correct image posi- 
tions. [FP03:18.5] 


objective function: 1) The cost function 


used in an optimization process. 
2) A measure of the misfit between the 
data and the model. [SQ04:2.3] 


oblique illumination: See low-angle 
illumination. 


observation probability: In a state 


space model the underlying hidden 
state variable is not observed directly, 
but observations are made according 
to the observation (or emission) prob- 
ability distribution px; |z), where z 
denotes the hidden state at time ¢ and 
x, denotes the corresponding observed 
variable. [Bis06:13.2] 


observation variable: In a state space 
model there is an observation vari- 
able x, at time £ corresponding to 
the hidden state variable z,. They are 
linked via the observation probability 
distribution. [BisO6:13.1] 


observational space: In a state space 
model the hidden state variables 
reside in the state space, while the 
observation variables reside in the 
observational space. [Bis06:13.1] 


observer: The individual (or camera) 
making observations. Most frequently 
this refers to the camera system from 
which images are being supplied. 
See also observer motion estimation. 
[WP: Observer] 


observer motion estimation: When an 
observer is moving, image data of the 
scene provides optical flow or track- 
able scene feature points. These allow 
an estimate of how the observer is mov- 
ing relative to the scene, which is use- 
ful for navigation control and position 
estimation. [Hor86:17.1] 


obstacle detection: Using visual data to 
detect objects in front of the observer, 
usually for mobile robotics applica- 
tions. [LATO2] 


Occam’s razor: An argument attributed 
to William of Occam (Ockham), an 
English nominalist philosopher of the 
early 14th century, stating that assump- 
tions must not be needlessly multiplied 
when explaining something (entia 
non sunt multiplicanda praeter 
necessitatem). Often used simply to 
suggest that, other conditions being 
equal, the simplest solution must be 
preferred. Notice variant spelling Ock- 
bam. See also minimum description 
length. [WP:Occam’s_razor] 


occluding contour: The visible edge of 
a smooth curved surface as it bends 
away from an observer. The occlud- 
ing contour defines a 3D space curve 
on the surface, such that a line of 
sight from the observer to a point 
on the space curve is perpendicular 
to the surface normal at that point. 
The 2D image of this curve may also 
be called the occluding contour. The 
contour can often be found by an 
edge detection process. In the figure, 
the left and right cylinder boundaries 
are occluding contours from our view- 
point: [FP03:19.2] 
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occluding contour analysis: A general 
term that includes: 
e occluding contour detection; 
e inference of the shape of the 3D 
surface at the occluding contour; 
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e determination of the relative depth 
of the surfaces on both sides of the 
occluding contour. 

[FP03:19.2] 


occluding contour detection: Deter- 
mining which of the image edges 
arise from an occluding contour. 
[FP03:19.2] 


occlusion: When an object lies between 
an observer and another object, the 
closer object occludes the more distant 
one in the acquired image. The 
occluded surface is the portion of 
the more distant object hidden by the 
closer object. In the figure, the cylin- 
der occludes the more distant brick: 


[Dav90:7.7] 


occlusion detection: The identification 
of features or objects within a scene 
which cause occlusion (visual overlap- 
ping) with other parts of the scene. 
The occlusion boundaries are high- 
lighted in the image on the right: 
[ZKOO] 


occlusion recovery: The process of 
attempting to infer the shape and 
appearance of a surface hidden 
by occlusion. This recovery helps 
improve completeness when recon- 
structing scenes and objects for virtual 
reality. The figure shows two occluded 
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pipes and an estimated recovery: 
[Dav90:7.7] 


p AE 


occlusion understanding: A general 


term for analyzing scene occlusions 
that may include occluding contour 
detection, determining the relative 
depths of the surfaces on both sides 
of an occluding contour and searching 
for tee junctions as a cue for occlusion, 
depth order etc. [Dav90:7.7] 


occupancy grid: A map construction 


technique used mainly for autonomous 
vehicle navigation. The grid is a 
set of squares or cubes represent- 
ing the scene, which are marked 
according to whether the observer 
believes the corresponding scene 
region is empty (hence navigable) or 
full. A probabilistic measure could 
also be used. Visual evidence from 
range sensors, binocular stereo sen- 
sors or acoustic sonar sensors are typ- 
ically used to construct and update 
the grid as the observer moves. 
[WP:Occupancy_grid_mapping] 


OCR: See optical character recognition. 


octree: A volumetric representation in 


which 3D space is recursively divided 
into eight (hence “oct”) smaller vol- 
umes by planes parallel to the XY, 
YZ, XZ coordinate system planes. A 
tree is formed by linking the eight 
subvolumes to each parent volume. 
Additional subdivision need not occur 
when a volume contains only object or 
empty space. This representation thus 
can be more efficient than a pure voxel 
representation. The figure ahows three 
levels of a pictorial representation of 
an octree, where one octant and the 
largest (leftmost) level is expanded to 
give the middle figure, and similarly an 
octant of the middle: [CH88] 


odd field: Standard interlaced scanning 
transmits all of the even scan lines in 
an image frame first and then all of the 
odd lines. The set of odd lines is the 
odd field. [Jai89:11.1] 


off-axis imaging: Image formation 
based on light which arrives from 
directions other than along the optical 
axis. This may have benefits in 
reducing aberration effects but also 
means reduced brightness because of 
vignetting. 


O’Gorman edge detector: A parametric 


omni-directional stereo: Stereo vision 
based on two panoramic images. These 
images are normally created by rotating 
a stereo pair of cameras. [[YT92] 


omni-directional vision: Any vision 
system using omni-directional 
cameras. [WGLSOO] 


online anomaly detection: An auto- 
mated method for the location of 
interesting or unusual events typi- 
cally within a continuous live data 
stream such as video. See also anomaly 
detection. [XG08] 


edge detector in which a decomposi- 
tion of the image and model by orthog- 
onal Walsh function masks was used 
to compute the step edge parame- 
ters (contrast and orientation). One 
advantage of the parametric model is 
a goodness-of-model fit as well as the 
edge contrast that increases the relia- 
bility of the detected edges. [OG78] 


omni-directional camera: A camera 
which can view all directions at once. 
Typically refers to a camera which can 
see in all directions on the ground 
plane (but not necessarily upwards or 
downwards). Often created by using a 
standard camera looking at a spherical 
mirror. [WP:Omnidirectional_camera] 


omni-directional sensing: Literally, 
sensing all directions simultaneously. 
In practice, this means using mirrors 
and lenses to project most of the lines 
of sight at a point onto a single cam- 
era image. The space behind the mir- 
rors and cameras is typically not visi- 
ble. See also catadioptric optics. In this 
figure, a camera using a spherical mir- 
ror achieves a very wide field of view: 
[WP:Omnidirectional_camera] 
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online filtered inference: The infer- 
ence of current state information (e.g., 
position) using techniques such as 
Kalman filtering or particle filtering, 
typically in a continuous (live) data 
stream. [Mur12:15.4.1] 


online learning: A form of learning 
where the parameter updates are 
made each time a new training exam- 
ple is received. Contrast with batch 
learning. See also incremental learning. 
[HTFO08:11.4] 


online likelihood ratio test: A form 
of likelihood ratio test where the data 
arrives sequentially. 


online model adaptation: Updating a 
model to deal with changing circum- 
stances (e.g., altering a mean shift 
model to cope with scale changes of 
the object being tracked in a video). 
[SP02] 


online processing: The processing of 
data when requested or when the data 
is acquired (as distinct from real-time 
or batch processing). See also real-time 
processing. [KH02] 


online video screening: A model for 
detecting irregularities in a video 
stream, e.g., vehicles performing ille- 
gal actions. [HGX09] 


one-class learning: Learning with the 
goal of taking as input a set of data 
points drawn from a probability distri- 
bution P and producing as output a 
“simple” subset S of the input space 
such that the probability that a test 
point drawn from P lies outside S 
is equal to some specified probabil- 
ity. One approach to this is through 
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the one-class support vector machine. 
Another method would be to carry 
out probability density estimation and 
define $ in terms of a probability 
contour. One-class learning can be 
used for outlier rejection and anomaly 
detection. [SSO2:Ch. 8] 


one-shot learning: Learning about a 
class based on only a few data points 
from that class. This can be possible 
by exploiting knowledge transfer from 
other classes that have already been 
learned. [FFP06] 


one-versus-rest classification: In a 
multiclass classification problem with 
K > 2 classes, construct K one-versus- 
rest problems, each classifying one 
class against data from all the rest. 
A test point is classified by select- 
ing the class with the largest strength 
of classifier output. This approach is 
commonly used for multiclass support 
vector machines. [HTF08:18.3.3] 


Opaque: When light cannot pass through 
a structure. This causes shadows and 
occlusion. [WP:Opacity_(Coptics)] 


open operator: A mathematical 
morphology operation applied to 
a binary image. The operator is a 
sequence of N erode operators fol- 
lowed by N dilate operators, both 
using a specified structuring element. 


The operator is useful for separating 
touching objects and removing small 
regions. In the figure, the image on 
the right was created by opening the 
one on the left with an 11-pixel disk 
kernel: [SB11:8.15] 


open set recognition: The recognition 
of an object where the set of possibili- 
ties is not closed, i.e., it is possible that 
the object is unknown to the system. 
See also closed set recognition. [FW05] 


operator: A general term for a function 
that is applied to some data in order to 
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transform it in some way. For exam- 
ple see image-processing operator. 
[Gal90:Ch. 5] 


opponent color: A color representation 
system, originally developed by Her- 
ing, in which an image is represented 
by three channels with contrasting 
colors: red-green, yellow-blue, and 
black-white. [BB82:2.2.5] 


optical axis: The ray, perpendicular to 
the lens and through the optical center, 
around which the lens is symmetrical: 
[FP03:1.1.1] 


Focal Point 


Optical Axis 


optical center: See focal point. 


optical character recognition (OCR): 
A general term for extracting an alpha- 
betic text description from an image 
of text. Common specialisms include 
bank numerals, handwritten digits, 
handwritten characters, cursive text, 
Chinese characters, Arabic characters 
etc. [JKS95:2.7] 


optical coherence tomography: A 
method of acquiring a 3D image 
of the top 1-2 mm of an object, 
such as some biological tissue, 
using near-infrared interferometry. 
[WP:Optical_coherence_tomography] 


optical flow: An instantaneous veloc- 
ity measurement for the direction and 
speed of the image data across the 
visual field. This can be observed at 
every pixel, creating a field of velocity 
vectors. The set of apparent motions 
of the image pixel brightness values. 
[FP03:25.4] 


optical flow boundary: A boundary 
between two regions where the 
optical flow is different in direction or 
magnitude. The regions can arise from 
objects moving in different directions 
or surfaces at different depths. See also 
optical flow field segmentation. In the 
figure, the dashed line is the boundary 


between optical flow moving left and 
optical flow moving right: [PGPO94] 
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optical flow constraint equation: 
The equation u +VI-u,.=0 that 
links the observed change in image 
I’s intensities over time H at image 
position x to the spatial change in 
pixel intensities at that position VJ 
and the velocity yx of the image data 
at that pixel. The constraint does 
not completely determine the image 
motion, as this has two degrees of 
freedom. The equation provides only 
one constraint, thus leading to an 
aperture problem. [WP:Optical_ 
flow#Estimation_of_the_optical_flow] 


optical flow field: The field composed 
of the optical flow vector at each pixel 
in an image. [FP03:25.4] 


optical flow field segmentation: The 
segmentation of an optical flow image 
into regions where the optical flow has 
a similar direction or magnitude. The 
regions can arise from objects moving 
in different directions or surfaces at 
different depths. See also optical flow 
boundary. [HPZC95] 


optical flow region: A region where 
the optical flow has a similar direction 
or magnitude. Regions can arise from 
objects moving in different directions, 
or surfaces at different depths. See also 
optical flow boundary. [HS81] 


optical flow smoothness constraint: 
The constraint that nearby pixels in 
an image usually have similar optical 
flow because they usually arise from 
projection of adjacent surface patches 
having similar motions relative to 
the observer. The constraint can be 
relaxed at the optical flow boundary. 
[Sny89] 


optical image processing: An 


image-processing technique in 
which the processing occurs by use 
of lenses and coherent light instead 
of by a computer. The key principle 
is that a coherent light beam that 
passes through a transparency of 
the target image and is then focused 
produces the Fourier transform of 
the image at the focal point where 
frequency domain filtering can occur. 
The figure shows a typical processing 
arrangement: [PBO1:Ch. 7] 
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optical process: A process that uses 


light and lenses. [WP:Optics] 


optical transfer function (OTF): Infor- 


mally, the OTF is a measure of how well 
spatially varying patterns are observed 
by an optical system. More formally, in 
a 2D image, let X(fp, fo) and YC fo, fo) 
be the Fourier transforms of the input 
x(b,v) and output yh, v) images. 
Then, the OTF of a horizontal and ver- 
tical spatial frequency pair (fp, fo) is 
HC fo, fo)/ HO, 0), where HCf,, fo) = 
Yh Jv)/XCfv, fo). The optical trans- 
fer function is usually a complex num- 
ber encoding both the reduction in sig- 
nal strength at each spatial frequency 
and the phase shift. [SB11:5.11] 


optics: A general term for the manip- 


ulation and transformation of light 
and images using lenses and mirrors. 
[JKS95:Ch. 8] 


optimal basis encoding: A general 


technique for encoding image or 
other data by projecting onto some 
basis functions of a linear space and 
then using the projection coefficients 
instead of the original data. Opti- 
mal basis functions produce projec- 
tion coefficients that allow the best dis- 
crimination between different classes 
of objects or members in a class (such 
as for face recognition). [LW00a] 
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optimization: A general term for find- 
ing the values of the parameters that 
maximize or minimize some quantity. 
[BB82:11.1.2] 


optimization parameter estimation: 
See optimization. 


OR operator: A pixelwise logical opera- 
tor defined on binary variables. It takes 
as input two binary images, / and 
h, and returns an image J, in which 
the value of each pixel is 0 if both J 
and J, are 0, and 1 otherwise. In the 
figure, the image on the right shows 
the result of ORing the other two 
images (note that the white pixels have 
value 1): [SB11:3.2.2] 


IN 


order statistics filter: A filter based on 
order statistics, a technique that sorts 
the pixels of a neighborhood by inten- 
sity value and assigns a rank (the posi- 
tion in the sorted sequence) to each. 
An order statistics filter replaces the 
central value of the filtering neighbor- 
hood with the value at a given rank in 
the sorted list. A popular example is 
the median filter. As this filter is less 
sensitive to outliers, it is often used 
in robust statistics processes. See also 
rank order filtering. [Umb98:3.3.1] 


ordered texture: See macrotexture. 


\ | 


ordering: Sorting a collection of objects 
by a given property, e.g., intensity 
values, in an order statistics filter. 
[(Umb98:3.3.1] 


ordering constraint: A stereo vision 
constraint stating that two points 
which appear in a particular order in 
one image are likely to appear in the 
same order in the other stereo image. 
This constraint will fail if the points 
are from objects at different depths, 
should they exhibit parallax. [CM92] 


orderless images: A form of image in 
which a probability distribution is asso- 
ciated with each point rather than an 
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intensity or color. The probability dis- 
tribution is computed from a local area 
around each point, ignoring the spatial 
structure. [KD99] 


ordinal transformation: A one-to-one 
mapping of a set of values to another 
set of values such that the ordering 
relations are maintained. An example 
transformation replaces a value by its 
index in a sorted list of all values. 
For example, [36,20,19,42,58] is trans- 
formed to [3,2,1,4,5], which can be 
processed, e.g., by median filtering and 
then re-transformed back to the origi- 
nal domain. 


Oren-Nayar model: A diffuse reflec- 
tance model for rough surfaces. 
[ON94] 


orientation: The property of being 
directed towards or facing a particu- 
lar region of space or a line; also, the 
pose or attitude of a body in space. 
For instance, the orientation of a vec- 
tor (where the vector points to) is 
specified by its unit vector; the ori- 
entation of an ellipsoid is specified 
by its principal directions; the orien- 
tation of a wire-frame model is speci- 
fied by its own reference frame with 
respect to a world reference frame. 
[WP:Orientation_(computer_vision)] 


orientation error: The amount of error 
associated with an orientation value. 
[VG96] 


orientation representation: See pose 
representation. 


oriented projective geometry: A ver- 
sion of projective geometry where 
orientations may be associated with 
lines, e.g., the direction in which 
light is traveling. [WP:Oriented_ 
projective_geometry] 


oriented smoothness: A con- 
straint which restricts changes in 
displacement vectors between two 
images in directions which have little 
gray scale variation. [Nag90] 


oriented texture: A texture in which a 
preferential direction can be detected. 
For instance, the direction of the 
bricks in a regular brick wall. See 
also texture direction and texture 
orientation. [RS91] 


orthogonal Fourier—Mellin moment 
invariants: Moment invariants based 
on Fourier-Mellin moments or rota- 
tional moments. [ZSH+ 10a] 


orthogonal image transform: A well- 
known class of techniques for image 
compression. The key process is the 
projection of the image data onto a 
set of orthogonal basis functions. See, 
e.g., the discrete cosine transform, 
the Fourier transform and the Haar 
transform. This is a special case of the 
linear integral transform. [Cla85] 


orthogonal regression: Also known as 
total least squares. Traditionally seen as 
the generalization of linear regression 
to the case where both x and y are mea- 
sured quantities and subject to error. 
Given samples x; and y;, the objective 
is to find estimates of the “true” points 
Ci, Ji), and line parameters (a, b, c) 
such that ax; + by, +c = 0, Vi, and 
such that the error $ œ; — XD + Qi — 
D? is minimized. This estimate is eas- 
ily obtained as the line (plane etc., 
in higher dimensions) passing through 
the centroid of the data, in the direc- 
tion of the eigenvector of the data 
scatter matrix that has the smallest 
eigenvalue. [WP:Total_least_squares] 


orthographic: The characteristic prop- 
erty of orthographic (or perpen- 
dicular) projection onto the image 
plane. See orthographic projection. 
[FP03:2.3] 


orthographic camera: A camera in 
which the image is formed accord- 
ing to an orthographic projection. 
[FP03:2.3] 


orthographic projection: Rendering of 
a 3D scene as a 2D image by a set of rays 
orthogonal to the image plane. The size 
of the objects imaged does not depend 
on their distance from the viewer. As 
a consequence, parallel lines in the 
scene remain parallel in the image. The 
equations of orthographic projections 
are 


x=X y=Y 


where x, y are the image coordinates 
of an image point in the camera ref- 
erence frame (that is, in millimeters, 


not pixels) and X, Y, Z are the coor- 
dinates of the corresponding scene 
point: [FP03:2.3] 
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orthoimage: In photogrammetry, the 
warp of an aerial photograph to 
an approximation of the image that 
would have been taken had the 
camera pointed directly downwards. 
See also orthographic projection. 
[WP:Orthophoto] 


orthonormal: A property of a set of 
basis functions or vectors. If <, > is the 
inner product function and a and b are 
any two different members of the set, 
then we have <d,a>=<b,b>=1 
and <a,b>=0. [WP:Orthonormal_ 
basis] 


OTF: See optical transfer function. 


outlier: Exception points in a set of data 
that mostly conforms to some regu- 
lar process or is well represented by 
a model. Classifying points as outliers 
depends both on the models used and 
the statistics of the data. This figure 
shows a line fit to some points and an 
outlying point: [CS09:3.4.6] 
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outlier detection: The identification of 
samples or points which lie outside the 
main distribution. [FP03:15.5] 


outlier rejection: Identifying outliers 
and removing them from the current 
process. Identification is often a diffi- 
cult process. [CS09:3.4.6] 


out-of-focus blur: The fuzziness (blur) 
which appears in image as a result of 
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the object which is viewed being out- than the Nyquist sampling rate. 
side the depth of field of the camera: [WP:Oversampling] 
[KPP98] 


over-segmented: The output of a 
segmentation algorithm in which the 
desired regions are represented by 
too many regions. In the figure, the 
image should be segmented into three 
regions but it was oversegmented into 
five regions: [SQ04:8.7] 


over-fitting: A model is said to over-fit 
the training set if its performance on 
an independent test set drawn from 
the same data-generating distribution 
is worse than on the training set. 
(Mur12:1.2.6] 


oversampling: The sampling density of 
a sampled continuous signal is greater 


¢ 
¢ 
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paired boundaries: See 
contours. 


paired 


paired contours: A pair of contours 
occurring together in images and 
joined by a spatial relationship, e.g., 
the contours generated by river banks 
in aerial images or the contours of a 
human limb: 


ne 


Co-occurrence can be exploited to 
make contour detection more robust. 
See also feature extraction. [HF94] 


pair-wise correlation: See correlation. 


pairwise geometric histogram 
(PGH): A line- or edge-based shape 
representation used for object 
recognition, especially 2D. Histograms 
are built by computing, for each 
line segment, the relative angle and 
perpendicular distance to all other seg- 
ments. The representation is invariant 
to rotation and translation. A PGH can 
be compared using the Bhattacharyya 
distance. [AFRW98] 


PAL camera: A camera conforming to 
the European phase alternation by line 
(PAL) standard. See also NTSC, RS-170 
and CCIR camera. [Jai89:4.1] 


palette: The range of colors available. 
[NA05:2.2] 


pan: Rotation of a camera about a sin- 
gle axis through the camera center and 
(approximately) parallel to the image 
vertical: [WP:Panning_(camera)] 


pan-tilt-zoom (PTZ): A camera for 
which the direction (pan and tilt) 
in which it points and the field of 
view (zoom) can be controlled elec- 
tronically. Very common in the visual 
surveillance domain. [WM96] 


panchromatic: Sensitive to light 
of all visible wavelengths. [WP: 
Panchromatic_film] 


panchromatic images: Gray scale 
images which are derived from all 
wavelengths of visible light. [GSCG04] 


panoramic: Associated with a wide field 
of view often created or observed 
by a panned camera. [WP:Panoramic_ 
photography] 


panoramic image mosaic: A class of 
techniques for collating a set of par- 
tially overlapping images into a single 
panoramic image. The figure shows a 
mosaic build from the frames of a hand- 
held camera sequence: 


Dictionary of Computer Vision and Image Processing, Second Edition. 
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© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. 


Typically, the mosaic yields both very 
high resolution and a large field of 
view, which cannot be simultaneously 
achieved by a physical camera. There 
are several ways to build a panoramic 
mosaic but, in general, there are three 
necessary steps: determine correspon- 
dences (see stereo correspondence 
problem) between adjacent images; 
use the correspondences to find a 
warping transformation between the 
two images (or between the current 
mosaic and a new image); blend the 
new image into the current mosaic. 
[SS97b] 


panoramic image stereo: A stereo sys- 
tem working with a very large field of 
view, say 360° in azimuth and 120° in 
elevation. Disparity maps and depths 
are recovered for the whole field of 
view simultaneously. A normal stereo 
system would have to be moved and 
the results registered to achieve the 
same result. See also binocular stereo, 
multi-view stereo and omni-directional 
sensing. [PBP01] 


Pantone matching system (PMS): A 
color matching system used by the 
printing industry to print spot col- 
ors. Colors are specified by Pantone 
name or number. PMS works well for 
spot colors but not for process colors, 
usually specified by the CMYK color 
model. [WP:Pantone] 


Panum’s fusional area: The region of 
space within which single vision is pos- 
sible (that is, you do not perceive dou- 
ble images of objects) when the eyes 
fixate on a given point. [CS09:6.7.1.1] 


parabolic point: A point on a smooth 
surface where the Gaussian curvature 
is positive. See also mean and Gaussian 
curvature shape classification. 
[Nal93:9.2.5] 


parallax: The angle between the two 
straight lines that join a point (possibly 
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a moving one) to two viewpoints. 
In motion analysis, motion parallax 
occurs when two scene points that 
project to the same image point at one 
viewpoint later project to different 
points as the camera moves. The 
vector between the two new points is 
the parallax: [TV98:8.2.4] 


N 


FINAL CAMERA 
POSITION 


INITIAL CAMERA 
POSITION 


parallel processing: An algorithm is 
executed in parallel, or through paral- 
lel processing, when it can be divided 
into a number of computations that are 
performed simultaneously on separate 
hardware. See also single instruction 
multiple data, multiple instruction 
multiple data, pipeline parallelism and 
task parallelism. [BB82:10.4.1] 


parallel projection: A generalization 
of orthographic projection in which 
a scene is projected onto the image 
plane by a set of parallel rays not 
necessarily perpendicular to the image 
plane. This is a good approximation 
of perspective projection, up to a uni- 
form scale factor, when the scene is 
small in comparison to its distance 
from the center of projection. Par- 
allel projection is a subset of weak 
perspective viewing, where the weak 
perspective projection matrix is sub- 
ject not only to orthogonality of the 
rows of the left 2 x 3 submatrix, but 
also to the constraint that the rows 
have equal norm. In orthographic pro- 
jection, both rows have unit norm. 
[FP03:2.3.1] 


parameter estimation: A class of tech- 
niques aimed at estimating the param- 
eters of a given parametric model. For 
instance, assuming that a set of image 


points lie on an ellipse and consider- 
ing the implicit ellipse model ax? + 
bxy + cy? + dx + ey + f, the parame- 
ter vector [a, b, c, d, e, f] can be esti- 
mated, e.g., by least square surface 
fitting. [DH73:3.1] 


parametric edge detector: An edge 
detection technique that seeks to 
match image data using a parametric 
model of edge points and thus detects 
edges when the image data fits the 
edge model well. See Hueckel edge 
detector. [Nal93:3.1.3] 


parametric mesh: A type of surface- 
modeling primitive for 3D models in 
which the surface is defined by a mesh 
of points. A typical example is non- 
uniform rational b-splines (NURBS). 
[GK04] 


parametric model: A mathematical 
model expressed as function of a set of 
parameters, e.g., the parametric equa- 
tion of a curve or surface (as opposed 
to its implicit form), or a paramet- 
ric edge model (see parametric edge 
detector). [Nal93:3.1.3] 


parametric warps: Transformations 
that distort or correct images using a 
single global parametric function, e.g., 
a perspective transformation. [GM98] 


paraperspective: An approximation of 
perspective projection, whereby a 
scene is divided into parts that 
are imaged separately by parallel 
projection with different parameters. 
[FP03:2.3.1-2.3.3] 


Pareto optimal: For a nontrivial prob- 
lem there is not a single solution that 
simultaneously optimizes each objec- 
tive. A Pareto optimal point occurs if 
it is not possible to improve any of the 
objectives further without at the same 
time worsening another. The solution 
to this multi-objective problem is a 
possibly infinite set of Pareto points. 
[BV04:4.7] 


part-based representation: A model 
that treats objects as a collection of 
parts. For example, a person can be 
considered to be composed of a head, 
a trunk, two arms and two legs. 
[Sze10:14.4.2] 


part recognition: A class of tech- 
niques for recognizing assemblies or 
articulated objects from their subcom- 
ponents (parts), e.g., a human body 
from head, trunk, and limbs. Parts 
have been represented by 3D mod- 
els, such as generalized cones and 
superquadrics. In industrial contexts, 
part recognition indicates the recogni- 
tion of specific items (parts) in a pro- 
duction line, typically for classification 
and quality control. [CFH05] 


part segmentation: A class of tech- 
niques for partitioning a set of data into 
components (parts) with an identity 
of their own, e.g., a human body into 
limbs, head and trunk. Part segmenta- 
tion methods exist for both 2D and 3D 
data, that is, intensity images and range 
images, respectively. Various geomet- 
ric models have been adopted for 
the parts, e.g., generalized cylinders, 
superellipses and superquadrics. See 
also articulated object segmentation. 
[BM02:6.2.2] 


partial volume interpolation: A tech- 
nique for performing interpolation of 
voxel values (e.g., MRI values) in 
which the interpolated values are not 
determined by averaging but rather 
by distributing existing local values. 
[HV03] 


partially constrained pose: A situation 
whereby an object is subject to a num- 
ber of constraints restricting the num- 
ber of admissible orientations or posi- 
tions, but not fixing one univocally. For 
instance, cars on a road are constrained 
to rotate around an axis perpendicular 
to the road. [WOFH93] 


partially observable Markov deci- 
sion (POMDP): A generalization of 
the Markov decision process where 
the agent cannot directly observe the 
underlying state, but where the obser- 
vations made relate stochastically to 
that state. [TBFO5:Ch. 15] 


particle counting: An application of 
particle segmentation to counting 
the instances of small objects (parti- 
cles), such as pebbles, cells or water 
droplets, in images or sequences: 
[WP:Particle_counter] 
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particle filter: A tracking strategy where 
the probability density of the model 
parameters is represented as a set of 
particles. A particle is a single sam- 
ple of the model parameters, with an 
associated weight. The probability den- 
sity represented by the particles is typ- 
ically a set of delta functions or a set 
of Gaussians with means at the parti- 
cle centers. At each tracking iteration, 
the current set of particles represents a 
prior on the model parameters, which 
is updated via a dynamical model and 
observation model to produce the new 
set representing the posterior distribu- 
tion. See also condensation tracking. 
[WP:Particle_filter] 


particle flow tracking: Determining 
the local flow by attempting to track 
individual local particles, e.g., smoke 
particles, in a flow. Groups of particles 
generally exhibit similar flow, although 
individual particles may not be partic- 
ularly stable. [MEYD11] 


particle segmentation: A class of 
techniques for detecting individual 
instances of small objects (particles), 
such as pebbles, cells or water 
droplets, in images or sequences. A typ- 
ical problem is severe occlusion caused 
by overlapping particles. This problem 
has been approached successfully with 
the watershed transform. [KMMH07] 


particle swarm optimization: An 
approach to optimization based on 
having a number of particles whose 
dynamics in the search space depend 
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on rules originally inspired by the sim- 
ulation of animal social behavior. Each 
particle has a pair of update functions 
as follows: 


Viet = Vir + Cy.randO@®,, — Pp, 
$ c2.randOQ, , =P,» 


P =D,,+ Vit 


=i, t+1 


where p,, is the position at time ¢ of 
particle 7; v,, is its velocity; b, is its 
best recorded position; P, is the posi- 
tion of the particle in the swarm achiev- 
ing the best evaluation score; f( p) is 
the evaluation function mapping par- 
ticle positions to values; and rand) 
is a randomly generated floating point 
number in the range [0, 1]. The pair 
of parameters cı and c2 are weightings 
between the importance of local and 
global best values. [PKB07] 


particle tracking: See condensation 


tracking. 


Parzen window: 1) A non-parametric 


method for density estimation. Given 
a sample {x;} with i = 1,..., drawn 
from some underlying density, the 
Parzen window estimate is obtained 
as P(X) = 4}, KQ, X) where K 
is a kernel function. The Gaussian 
distribution is one common choice for 
the kernel: [DH73:4.3]. [PTVF92:13.4] 
2) A triangle-shaped weighting win- 
dow used to limit leakage to spuri- 
ous frequencies when computing the 
power spectrum of a signal: 


* 


> 


See also windowing and Fourier 
transform. [DH73:4.3] 


passive sensing: A sensing process that 


does not emit any stimulus or where 
the sensor does not move. A nor- 
mal stationary camera is a passive 
sensor. Compare with active sensing. 
[Nal93:1.1] 


passive stereo algorithm: An algo- 
rithm that uses only the informa- 
tion obtainable by a stationary set 
of cameras and ambient illumination. 
This contrasts with the active vision 
paradigm in stereo, where the cam- 
era(s) might move or some projected 
stimulus might be used to help solve 
the stereo correspondence problem. 
[BM02:1.9.2] 


patch classification: The problem of 
attributing a surface patch to a partic- 
ular class in a shape catalog, typically 
computed from dense range data using 
curvature estimates or shading. See 
also curvature sign patch classification 
and mean and Gaussian curvature 
shape classification. [GVS03] 


path coherence: A property used in 
tracking objects in an image sequence. 
The assumption is that the object 
motion is mostly smooth in the scene 
and thus the observed motion in a 
projected image of the scene is also 
smooth. [JKS95:14.6] 


path finding: The problem of determin- 
ing a path with given properties in a 
graph, e.g., the shortest path connect- 
ing two given nodes or two nodes with 
given properties. A path is defined as a 
linear subgraph. Path finding is a char- 
acteristic problem of state-space meth- 
ods, inherited from symbolic artificial 
intelligence. See also graph searching. 
This term is also used in the context 
of dynamic programming search, e.g., 
applied to the stereo correspondence 
problem. [WP:Pathfinding] 


pattern grammar: See shape grammar. 
[WP:Pattern_grammar] 


pattern matching: See 
matching. 


template 


pattern recognition: A large research 
area concerned with the recog- 
nition and classification of struc- 
tures, relations or patterns in data. 
Classic techniques include syntactic 
pattern recognition, structural pattern 
recognition and statistical pattern 
recognition. [Sch89:Ch. 6] 


PCA: See principal component analysis. 


PDE: Partial differential equation. 


PDM: See point distribution model. 
[WP:Point_distribution_model] 


peak: A general term for when a sig- 
nal value is greater than the neighbor- 
ing signal values. An example of a sig- 
nal peak measured in one dimension 
is when crossing a bright line lying 
on a dark surface along a scanline; if 
the pixel values 7, 45, 105, 54, 7 are 
observed, the peak is at 105. A two- 
dimensional example is when observ- 
ing the image of a bright spot on a 
darker background. [SOS00:3.4.5] 


Pearson’s correlation coefficient: See 
correlation. 


pedestrian detection: The automatic 
identification of people in a street 
scene: [Sze10:14.1.2] 


pedestrian surveillance: See person 
surveillance. 


pel: See pixel. 


penalty term: In fitting models to 
data, it is often useful to carry out 
regularization, i.e., to add a smooth- 
ness or penalty term to the data fit 
term. If the data fit of a model M 
with parameters 0 on dataset D is 
given by the negative log-likelihood 
— log p(D|0, M), then the penal- 
ized negative log likelihood takes 
the form J@)= — log p(D|0, M) + 
APO), where ®(@) is a penalty term 
on 0, and A > 0 is a regularization con- 
stant. An example of a penalty term 
is a sum of squares ®(0) = Xa 0? as 
used in ridge regression. For a Bayesian 
statistical model, the penalty term can 
be interpreted as the negative log of 
the prior distribution and the minimiza- 
tion of J (0) is equivalent to finding the 
maximum a posteriori probability for 
0. [HTF08:3.4] 


pencil of lines: A bundle of lines passing 
through the same point: 
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If Ð is a generic bundle line and Po the 
point through which all lines pass, the 
bundle is 


B= po tri 
where A is a real number and v is the 


direction of the individual line (both 
are parameters). [FP03:13.1.4] 


penumbra: A region in which part of 
the light source is obscured and is thus 
in partial shadow. This is distinct from 
the umbra (the full shadow region). 


Shadows 
Object 


Light Source 


[WP:Umbra] 


percentile method: A specialized tech- 
nique used for selecting a thresh- 
old (see thresholding). The method 
assumes that the percentage of the 
scene that belongs to the desired 
object (e.g., a darker object against 
a lighter background) is known. The 
threshold that selects the correct per- 
centage of pixels is used. [JKS95:3.2.1] 


perception: The process of understand- 
ing the world through the analysis 
of sensory input (such as images). 
[DH73:1.1] 


perceptron: A computational element 


Ù -X+ b) that acts on a data vec- 
tor x, where w is a vector of weights, 
b is a scalar (called the bias) and 
$C) is the activation function. Often 
$C) is taken to be a step function. 
Perceptrons are often used for clas- 
sifying data into one of two classes 
G.e., PCW - X + b) > Oand dd - X + b) 
< 0). Perceptrons can be combined to 
make a multilayer perceptron network. 
See also classification, supervised 
classification and pattern recognition. 
[Bis06:4.1.7] 
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perceptron network: See multilayer 


perceptron network and single-layer 
perceptron network. 


perceptual grouping: See perceptual 


organization. 


perceptual organization: A theory 


based on Gestalt psychology, centered 
on the tenet that certain organizations 
(or interpretations) of visual stimuli are 
preferred over others by the human 
visual system. A famous example is 
that a drawing of a wire-frame cube 
is immediately interpreted as a 3D 
object, instead of a 2D collection of 
lines. This concept has been used in 
several low-level vision systems, typi- 
cally to find groups of low-level fea- 
tures most probably generated by inter- 
esting objects. See also grouping and 
Lowe’s curve segmentation method. 
The figure shows a more complex 
example, where the top ends of the fea- 
tures suggests a virtual horizontal line: 
[FP03:14.2] 
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performance characterization: A class 


of techniques aimed at assessing the 
performance of computer vision sys- 
tems in terms of, e.g., accuracy, preci- 
sion, robustness to noise, repeatability 
and reliability. [TV98:A.1] 


perimeter: 1) In a binary image, the set 


of foreground pixels that touch the 
background. 

2) The length of the path through those 
pixels. [JKS95:2.5.6] 


periodic color filter arrays: An array or 


mosaic of color filters placed over indi- 
vidual sensing elements within a cam- 
era so as to allow pixels to sense differ- 
ent wavelengths of light and hence per- 
mit the creation of a color image. These 
color filters are normally arranged in 
a periodic fashion. See also Bayer 
pattern. [Sze10:pp. 85-86] 


periodicity estimation: The problem 


of estimating the period of a phe- 
nomenon, e.g., determining a fixed 


pattern’s size, given a texture created 
by its repetition. [LP96] 


person surveillance: A class of tech- 
niques aimed at detecting, tracking, 
counting, and recognizing people or 
their behavior in CCTV videos, for 
security purposes. For example, sys- 
tems have been reported for auto- 
mated surveillance in car parks, banks, 
airports etc. A typical system must 
detect the presence of a person, track 
the person’s movement over time, 
possibly identify the person using a 
database of known faces and classify 
the person’s behavior according to a 
small class of pre-defined behaviors 
(e.g., normal or anomalous). See also 
anomalous behavior detection, face 
recognition and face tracking. [JJSO4] 


perspective: The rendering of a 3D 
scene as a 2D image according to 
perspective projection, the key char- 
acteristic of which is, intuitively, that 
the size of the imaged objects depend 
on their distance from the viewer. As 
a consequence, the image of a bun- 
dle of parallel lines is a bundle of lines 
converging into a point, the vanishing 
point. The geometry of perspective 
was formalized by the master painters 
of the Italian Quattrocento and Renais- 
sance. [FP03:2.2] 


perspective camera: A camera in which 
the image is formed according to 
perspective projection. The corre- 
sponding mathematical model is com- 
monly known as the pinhole camera 
model: [FP03:2.2] 
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perspective distortion: A type of dis- 
tortion in which lines that are parallel 
in the real world appear to converge 
in a perspective image. In the figure, 
notice how the train tracks appear to 
converge in the distance: [SB11:2.3.1] 


perspective inversion: The problem 
of determining the position of a 3D 
point from its image, i.e., solving 
the perspective projection equations 
for 3D coordinates. See also absolute 
orientation. [FP03:2.2] 


perspective projection: Imaging a 
scene with foreshortening. The projec- 
tion equation of perspective is 


X Y 
=S eee a 


where x, y are the image coordinates 
of an image point in the camera ref- 
erence frame (e.g., in millimeters, not 
pixels), f is the focal length and 
X, Y, Z are the coordinates of the cor- 
responding scene point. [FP03:1.1.1] 


perspective transformation: The 
transformation which a scene under- 
goes when viewed by a pinhole 
camera. [FP03:1.1.1] 


PET: See positron emission tomography. 


phase-based registration: An image 
registration technique that uses the 
local phase in the two images to deter- 
mine the correct alignment. [GBNO06] 


phase congruency: The property 
whereby components of the Fourier 
transform of an image are maximally in 
phase at feature points, such as step 
edges or lines. Phase congruency is 
invariant to image brightness and con- 
trast and has therefore been used as an 
absolute measure of the significance of 
feature points. See also image feature. 
[WP:Phase_congruency] 


phase correlation: A motion estima- 
tion method that uses the translation- 
phase duality property of the Fourier 
transform (a shift in the spatial domain 
is equivalent to a phase shift in the 
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frequency domain). When using log- 
polar coordinates and the rotation 
and scale properties of the Fourier 
transform, spatial rotation and scale 
can be estimated from the frequency 
shift, independent of spatial transla- 
tion. See also planar motion estimation. 
[WP:Phase_correlation] 


phase-matching stereo algorithm: 
An algorithm for solving the stereo 
correspondence problem by looking 
for similarity of the phase of the Fourier 
transform. [SMM04] 


phase-retrieval problem: The prob- 
lem of reconstructing a signal based 
on only the magnitude (not the 
phase) of the Fourier transform. 
[WP:Phase_retrieval] 


phase spectrum: The Fourier transform 
of an image can be decomposed into 
its phase spectrum and its power 
spectrum. The phase spectrum is the 
relative phase offset of the given spatial 
frequency. [Hec87:11.2.1] 


phase-unwrapping technique: The 
process of reconstructing the true 
phase shift from phase estimates 
“wrapped” into [—z,z]. The true 
phase shift values may not fall in this 
interval but may instead be mapped 
into the interval by addition or sub- 
traction of multiples of 27. The tech- 
nique maximizes the smoothness of 
the phase image by adding or subtract- 
ing multiples of 27 at various image 
locations. See also Fourier transform. 
[JB94a] 


phi-s curve (¢-s): A technique for rep- 
resenting planar contours. Each point 
in the contour is represented by the 
angle @, formed by the line through 
P and the shape’s center (e.g., the 
barycentrum or center of mass) with 
a fixed direction, and the distance s 
from the center to P: 


P 
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See also 
[BB82:8.2.3] 


Phong reflectance model: An empir- 
ical model of local illumination that 
models the way in which surfaces 
reflect light as a mixture of diffuse 
reflection and specular reflection. 
[Sze10:pp. 65-67] 


photo consistency: See shape from 
photo consistency. [WP:Photo- 
consistency] 


shape representation. 


photodiode: The basic element, or 
pixel, of a CCD or other solid state sen- 
sor, converting light to an electric sig- 
nal. [WP:Photodiode] 


photogrammetry: A research area con- 
cerned with obtaining reliable and 
accurate measurements from noncon- 
tact imaging, e.g., a digital height 
map from a pair of overlapping satel- 
lite images. Consequently, accurate 
camera calibration is a primary con- 
cern. The techniques used overlap 
many typical of image processing and 
pattern recognition. [FP03:3.4] 


photography: Taking pictures or videos 
with a camera or other device 
capable of recording an image. 
[WP:Photography] 


photometric decalibration: The cor- 
rection of intensities in an image so 
that the same surface (at the same ori- 
entation) will give the same response 
regardless of the position in which it 
appears in the image. [Dun70] 


photometric invariant: A feature or 
characteristic of an image that is insen- 
sitive to changes in illumination. See 
also invariant. [GLU96] 


photometric stereo: A technique recov- 
ering surface shape (more precisely, 
the surface normal at each surface 
point) using multiple images acquired 
from a single viewpoint but under dif- 
ferent illumination conditions. These 
lead to different reflectance maps, 
which together constrain the surface 
normal at each point. [FP03:5.4] 


photometry: A branch of optics con- 
cerned with the measurement of the 
amount or the spectrum of light. 
In computer vision, one frequently 


uses photometric models expressing 
the amount of light emerging from 
a surface, be it fictitious or the sur- 
face of a radiating source, or from 
an illuminated object. A well-known 
photometric model is Lambert’s law. 
[WP:Photometry_Coptics)] 


photon noise: Noise generated by the 
statistical fluctuations associated with 
photon counting over a finite time 
interval in the CCD or other solid state 
sensor of a digital camera. Photon noise 
is not independent of the signal and is 
not additive. See also image noise and 
digital camera. [WP:Image_noise] 


photopic response: The sensitivity- 


wavelength curve modeling the 
response of the human eye to nor- 
mal lighting conditions. In such 
conditions, the cones are the photore- 
ceptors on the retina that best respond 
to light. Their response curve peaks 
at 555 nm, indicating that the eye is 
maximally sensitive to green-yellow 
colors in normal lighting conditions. 
When light intensity is very low, the 
rods determine the eye’s response, 
modeled by the scotopic curve, which 
peaks near to 510 nm. [Jai89:3.2] 


photosensor spectral response: The 


characterization of a sensor’s out- 
put as a function of the input 
light’s spectral frequency. See also 
spectral response, Fourier transform, 
frequency spectrum and spectral 
frequency. [WP:Frequency_spectrum] 


physics-based vision: An area of com- 


puter vision seeking to apply the laws 
or methods of physics (optics, sur- 
faces, illumination etc.) to the anal- 
ysis of images and videos. Examples 
include polarization-based methods, in 
which physical properties of the scene 
surfaces are estimated via estimates of 
the state of polarization of the incom- 
ing light and detailed radiometric mod- 
els of image formation. [DKFH97] 


pictorial pattern recognition: Object 
recognition or the recognition of pat- 
terns where the a priori known mod- 
els are simply pictures of objects 
which have previously been observed: 
[Che69] 
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picture element: A pixel. An indivisible 


image measurement. This is the small- 
est directly measured image feature. 
[SB11:Ch. 3] 


picture tree: A recursive image and 2D 


shape representation in which a tree 
data structure is used. Each node in the 
tree represents a region that is then 
decomposed into subregions. These 
are represented by child nodes. The 
figure shows (left) a segmented image 
with four regions and (right) the corre- 
sponding picture tree: [JKS95:3.3.4] 
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piecewise rigidity: The property of an 


object or scene that some of its parts, 
but not the object or scene as a whole, 
are rigid. Piecewise rigidity can be a 
convenient assumption, e.g., in motion 
analysis. [Koe86] 


pincushion distortion: A form of radial 


lens distortion where image points are 
displaced away from the center of dis- 
tortion by an amount that increases 
with the distance to the center. A 
straight line that would have been 
parallel to an image side is bowed 
towards the center of the image. This 
is the opposite of barrel distortion: 
(Hec87:6.3.1] 
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pinhole camera model: The mathe- 
matical model for an ideal perspective 
camera formed by an image plane and 
a point aperture through which all 
incoming rays must pass. For equa- 
tions, see perspective projection. This 
is a good model for a simple con- 
vex lens camera, where all rays pass 
through the virtual pinhole at the focal 
point: [FP03:1.1] 
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pink noise: Noise that is not white 
noise, i.e., when there is a correlation 
between the noise at two pixels or at 
two times. [WP:Pink_noise] 


pipeline parallelism: Parallelism 
achieved with two or more, possibly 
dissimilar, computation devices. The 
non-parallel process comprises steps 
A and B and operates on a sequence of 
items x;, 7 > 0, producing outputs 4. 
The result of B depends on the result of 
A, so a sequential computer computes 
a; = Ax); yı = Ba); for each i. A 
parallel computer cannot compute 
a; and y; simultaneously as they 
are dependent, so the computation 
requires the following steps: 


a = AM) 
yn = Ba) 
a, = A(X) 
Jn = BC) 
as = A(x) 
a; = AM) 
Vi = BE)... 
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Notice that we compute y; just after 
yi-1, so the computation can be 
arranged as: 


a = A) 
a = A~) n= Ba) 
az = A(x) J2 = B@) 


G41 = AD N= BA) 


Steps on the same line may be com- 
puted concurrently as they are inde- 
pendent. The output values y; there- 
fore arrive at a rate of one every cycle 
rather than one every two cycles with- 
out pipelining. The pipeline process 
can be visualized as: [ATO1] 


X; aj nar 
HI A i B Yi-1 


pit: 1) A general term for when a signal 
value is lower than the neighboring sig- 
nal values, unlike signal peaks, which 
are higher than neighboring values. For 
example, a pit occurs when observing 
the image of a dark spot on a lighter 
background. 
2) A local point-like concave shape 
defect in a surface. [CNH+03] 


pitch: A 3D rotation representation 
(along with yaw and roll) often used 
for cameras or moving observers. The 
pitch component specifies a rota- 
tion about a horizontal axis to give 
an up-down change in orientation: 
(JKS95:12.2.1] 
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pixel: The intensity values of a digital 
image are specified at the locations 
of a discrete rectangular grid; each 
location is a pixel. A pixel is char- 
acterized by its coordinates (position 
in the image) and intensity value (see 


intensity and intensity image). Values 
can express physical quantities other 
than intensity for different kinds of 
images (e.g., depth image). In physi- 
cal terms, a pixel is the photosensi- 
tive cell on the CCD or other solid 
state sensor of a digital camera. The 
CCD pixel has a precise size, speci- 
fied by the manufacturer and deter- 
mining the CCD’s aspect ratio. See 
also intensity sensor and photosensor 
spectral response. [SB11:Ch. 3] 


pixel addition operator: A low-level 
image-processing operator taking as 
input two gray scale images, J; and 
h, and returning an image J, in which 
the value of each pixel is B = + h. 
The figure shows the sum of the two 
images on the left (divided by 2 to 
rescale to the original intensity level): 
[SB11:3.2.1] 


pixel-based representation: Any rep- 
resentation where the data is in a 2D 


spatial array equivalent to an image. 
See also pixel. 


pixel-change history: The values 
exhibited by a pixel over some past 
period in time. [XGP02] 


pixel classification: The problem of 
assigning the pixels of an image to 
certain classes. This can use either 
supervised classification (with mod- 
els of known classes) or unsupervised 
methods such as clustering (when no 
models are known). See also image 


segmentation. The figure shows (left) 
an image and (right) its pixels in four 
classes denoted by four shades of gray: 
[Nal93:3.3.1] 


pixel connectivity: The pattern specify- 
ing which pixels are considered neigh- 
bors of a given one (X) for the purposes 
of computation. Common connectivity 
schemes are (left) 4 connectedness and 
(right) 8 connectedness: [SB11:4.2] 


pixel coordinate transformation: The 
mathematical transformation linking 
two image reference frames, specify- 
ing how the coordinates of a pixel 
in one reference frame are obtained 
from the coordinate of that pixel 
in the other reference frame. One 
linear transformation can be speci- 
fied by 4, = aiz + bj, +e, jı = cig + 
dj,+ f where the coordinates of 
po = ch, j2) are transformed into Pı = 
(4, jı). In matrix form, Pı = Ap. +t, 


ab 
with A= ( A a rotation matrix 
7 LOALDI MAEA 


> e 
and f= (<) a translation vector. 


See also Euclidean transformation, 
affine transformation and homography 
transformation. [SK99] 


pixel coordinates: The coordinates of 
a pixel in an image. Normally these 
are the row and column position. 
(JKS95:12.1] 


pixel counting: A simple algorithm to 
determine the area of an image region 
by counting the numbers of pixels 
composing the region. See also region. 
[WP:Simulation_cockpit#Aircraft_ 
Simpits] 


pixel division operator: An operator 
taking as input two gray scale images, 
J, and 4, and returning an image J; in 
which the value of each pixel is , = 
qT, /h. [Dav90:Ch. 2] 


pixel exponential operator: A low- 
level image-processing operator tak- 
ing as input one gray scale image, [,, 
and returning an image h in which 
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the value of each pixel is b = cb". 
This operator is used to change the 
dynamic range of an image. The value 
of the basis b depends on the desired 
degree of compression of the dynamic 
range and c is a scaling factor. See 
also logarithmic transformation and 
pixel logarithm operator. The figure 


shows (left) an image and (right) the 
image 1.005 raised to the pixel values: 
[BB82:Ch. 1] 


pixel logarithm 


pixel gray scale resolution: The num- 
ber of different gray levels that can 
be represented in a pixel, depend- 
ing on the number of bits associated 
with each pixel. For instance, an 8- 
bit pixel (or image) can represent 2° = 
256 different intensity values. See also 
intensity, intensity image and intensity 
sensor. [GAMO1] 


pixel interpolation: See image 


interpolation. [WP:Pixelation] 


pixel jitter: A frame grabber must esti- 
mate the pixel sampling clock of a 
digital camera, i.e., the clock used to 
read out the pixel values, which is 
not included in the output signal of 
the camera. Pixel jitter is a form of 
image noise generated by time varia- 
tions in the frame grabber’s estimate 
of the camera’s clock. [RWG+09] 


operator: An 
image-processing operator taking 
as input one gray scale image, 
l, and returning an image J, in 
which the value of each pixel is 
h = clog, +1) This opera- 
tor is used to change the dynamic 
range of an image (see also contrast 
enhancement), such as for the 
enhancement of the magnitude of 
the Fourier transform. The base b of 
the logarithm function is often e, but 
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it does not actually matter because the 
relationship between logarithms of 
any two bases is only one of scaling. 
See also pixel exponential operator. 
The figure shows (left) an image and 
(right) the scaled logarithm of the 
pixel values of the image: [SB11:3.3.1] 
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pixel multiplication operator: An 


image-processing operator taking as 
input two gray scale images, J; and D, 
and returning an image J, in which 
the value of each pixel is 1, = M * 
h. The figure shows (left and mid- 
dle) two images and (right) their 
product (scaled by 255 for contrast): 
[SB11:3.2.1.2] 


pixel subsampling: The process of 


producing a smaller image from a 
given one by including only one 
pixel out of every N. Subsampling 
is rarely applied this literally, how- 
ever, as severe aliasing is introduced; 
scale space filtering is applied instead. 
[AP08] 


pixel subtraction operator: A low- 


level image-processing operator taking 
as input two gray scale images, J; and 
h, and returning an image J; in which 
the value of each pixel is 3 = I, — b. 
This operator implements the simplest 
possible change detection algorithm. 
The figure shows (left and middle) two 
images and (right) the middle image 
subtracted from the left image (with 
128 added): [SB11:3.2.1] 
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place recognition: The identification of 
the viewer’s location. [Sze10:14.3.3] 


planar facet model: See surface mesh. 


planar mosaic: A panoramic image 
mosaic of a planar scene. The trans- 
formation linking different views of a 
planar scene is a homography. [CZ98] 


planar motion estimation: A class 
of techniques aiming to estimate the 
motion parameters of bodies moving 
on a plane in space. See also motion 
estimation. [HZ00:18.8] 


planar patch extraction: The prob- 
lem of finding planar regions, or 
patches, most commonly in range 
images. Plane extraction can be use- 
ful, e.g., in 3D pose estimation, as sev- 
eral model-based matching techniques 
yield higher accuracy with planar than 
non-planar surfaces. [CZ01] 


planar patches: See 
triangulation. 


surface 


planar projective transformation: 
See homography. 


planar rectification: A class of 
rectification algorithms projecting the 
original images onto a plane parallel to 
the baseline of the cameras. See also 
stereo and stereo vision. [RMC97] 


planar scene: 1) When the depth of 
a scene is small with respect to its 
distance from the camera, the scene 
can be considered planar, and useful 
approximations can be adopted; e.g., 
the transformation between two views 
taken by a perspective camera is a 
homography. See also planar mosaic. 
2) When all of the surfaces in a scene 
are planar, e.g., in a blocks-world 
scene. [BS03] 


Planckian locus: The curve that an 
incandescent black body would fol- 
low in chromaticity space as it changes 
temperature. [WP:Planckian_locus] 


plane: The locus of all points x such that 
the surface normal 7 of the plane and 


a point in the plane p satisfy the rela- 
tion (x — P): n= 0. In 3D space, e.g., 
a plane is defined by two vectors and 
a point lying on the plane, so that the 
plane’s parametric equation is: 


p=au+ bü + po, 


where pis the generic plane point, 7, v 
are the two vectors and Po is the point 
defining the plane. The implicit equa- 
tion of a plane is ax + by + cz + d = 
0, where [x, Y, z] are the coordinates 
of the generic plane point. In vector 
form, p-ù=d, where P= [x,y,z], 
n= [a,b,c] is a vector perpendicu- 
lar to the plane, and is the dis- 


T all 
tance of the plane from the origin. 


All of these definitions are equivalent. 
UKS95:13.3.1] 


plane conic: Any of the curves defined 
by the intersection of a plane with a 
3D double cone: ellipse, hyperbola and 
parabola. Two intersecting lines and a 
single point represent degenerate con- 
ics, defined by special configurations 
of the cone and the plane. The implicit 
equation of a conic is ax? + bxy + 
cy? + dx + ey + f = 0. See also conic 
fitting. The figure shows an ellipse 
formed by intersection: [JKS95:6.6] 


/\ 
plane-+ parallax: A framework in which 
multiple cameras are arranged on a 
plane and in which parallax is used rel- 


ative to a reference plane to estimate 
depth. [Sze10:p. 55] 


plane projective transfer: An algo- 
rithm based on projective invariants 
that, given two images of a planar 
object, J, and J, and four feature cor- 
respondences, determines the position 
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of any other point of J, in J,. Interest- 
ingly, no knowledge of the scene or 
of the imaging system’s parameters is 
necessary. [DZB92] 


plane projective transformation: The 
linear transformation between the 
coordinates of two projective planes, 
also known as homography. See 
also projective geometry, projective 
plane and projective transformation. 
[FP03:18.4.1] 


plane sweeping: An algorithm where a 
plane is virtually swept across a 3D vol- 
ume, processing data as it is encoun- 
tered. The 2D version of this approach 
is called line sweeping. [GFM+07] 


plenoptic camera: A camera that 
records the flow of light at all posi- 
tions and in all directions. [WP:Light- 
field_camera] 


plenoptic function representation: A 
parameterized function for describing 
everything that is visible from a given 
point in space. A fundamental rep- 
resentation in image-based rendering. 
[FP03:26.3] 


plenoptic sampling: Sampling of the 
light field function (i.e., the plenoptic 
function). Often this term additionally 
refers to the sampling of the surface 
and texture information in the scene. 
[CTCSOO] 


Plessey corner finder: A well-known 
corner detector also known as the 
Harris corner detector, based on the 
local autocorrelation of first-order 
image derivatives. See also feature 


a point 4 on the line, one the direction 
n with a-n= 0. [Fau93:2.5.1] 


PMS: See Pantone matching system. 
[WP:Pantone] 


PnP problem: The perspective n point 
problem; Estimating camera pose from 
n known 2D to 2D point correspon- 
dences. [$ze10:6.2.1] 


point: A primitive concept of Euclidean 
geometry, representing an infinitely 
small entity. In computer vision, pixels 
are regarded as image points and one 
speaks of “points in the scene” as posi- 
tions in the 3D space observed by the 
cameras. [WP:Point_(geometry)] 


point-based rendering: Methods for 
drawing an image based on point prim- 
itives which are 3D points rather than 
polygonal patches. [BK03] 


point cloud: A set (usually large) of 
points in 3D space. [WP:Point_cloud] 


point distribution model (PDM): A 
shape representation for flexible 2D 
contours. It is a type of deformable 
template model and its parameters 
can be learned by supervised learning. 
It is suitable for 2D shapes that 
undergo general but correlated defor- 
mations or variations, such as com- 
ponent motion or shape variation. 
For instance, fronto-parallel images of 
leaves, fish or human hands, resistors 
on a board, people walking in surveil- 
lance videos etc. The shape variations 
of the contour in a series of examples 
are captured by principal component 
analysis. See also active shape model. 


extraction. [WP:Corner_detection# 
The_Harris_.26_Stephens_.2F_Plessey_ 
.2F_Shi.E2.80.93Tomasi_corner_de- 
tection_algorithm] 


Plicker line coordinates: A represen- 
tation of lines in projective 3D space. 
A line is represented by six numbers 
Gia, liz, lis, b3, b4, 14) that must sat- 
isfy the constraint that Z12134 + lı3l24 + 
Lish = 0. The numbers are the entries 
of the Pliicker matrix, L, for the line. 
For any two points A, B on the line, L 
is given by l;; = A;B; — B,;A;. The pen- 
cil of planes containing the line are the 
nullspace of L. The six numbers may 
also be seen as a pair of 3-vectors, one 
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[WP:Point_distribution_model] 


point feature: An image feature that 
occupies a very small portion of an 
image, ideally one pixel, and is there- 
fore local in nature. Examples are cor- 
ners (see corner detection) or edge 
pixels. Notice that, although point fea- 
tures occupy only one pixel, they 
require a neighborhood to be defined; 
e.g., an edge pixel is characterized 
by a sharp variation of image values 
in a small neighborhood of the pixel. 
[CMS95] 


point invariant: A property that can 
be measured at a point in an image 


and is invariant to some transforma- 
tion. For instance, the ratio of a pixel’s 
observed intensity to that of its bright- 
est neighbor is invariant to changes in 
illumination and the magnitude of the 
gradient of intensity at a point is invari- 
ant to translation and rotation. (Both 
of these examples assume ideal images 
and observation.) [MS04] 


point light source: A point-like light 
source, typically radiating energy radi- 
ally, whose intensity decreases as +, 
where r is the distance to the source. 
[FP03:5.2.2] 


point matching: A class of algorithms 
solving feature matching or the stereo 
correspondence problem for point 
features. [Zha94] 


point of extreme curvature: A point 
where the curvature achieves an 
extremum, i.e., a maximum or a 
minimum. The figure shows one of 
each type of extremum: [WP:Vertex_ 
(geometry)] 


MINIMA 


MAXIMA 


point sampling: Selection of discrete 
points of data from a continuous 
signal. For example, a digital camera 
samples a continuous image function 
into a digital image. [WP:Sampling_ 
(signal_processing)] 


point similarity measure: A func- 
tion measuring the similarity of image 
points (actually small neighborhoods 
to include sufficient information to 
characterize the image location), e.g., 
cross correlation, sum of absolute dif- 
ferences (SAD), or sum of squared dif- 
ferences (SSD). [JP73] 


point source: A point light source. 
An ideal illumination source in which 
all light comes from a single spatial 
point. The alternative is an extended 
light source. The assumption of being 
a point source allows easier inter- 
pretation of shading, shadows etc. 
[FP03:5.2.2] 


point spread function: The response 
of a 2D system or filter to an input 
Dirac impulse. The response is typi- 
cally spread over a region surround- 
ing the point of application of the 
impulse, hence the name. Analogous 
to the impulse response of a 1D sys- 
tem. See also filter and linear filter. 
[FP03:7.2.2] 


Poisson distribution: A discrete 
random variable taking on the values 
0, 1, 2,... with probability mass 
function p(x) = 4A*e™*. à is known 
as the rate parameter and E[x] =à. 
[Was04:p. 27] 


Poisson noise: Noise which has the 
form of a Poisson distribution, where 
the variance scales directly with the 
mean intensity. It can arise from CCD 
sensors. 


Original Poisson Noise Combined 
Image (enhanced) Image 


Poisson noise removal: The removal or 
attenuation of Poisson noise. [SHW93] 


polar coordinates: A system of coor- 
dinates specifying the position of a 
point P in terms of the direction of 
the line through P and the origin, and 
the distance from P to the origin along 
that line. For example, the transforma- 
tion between polar (7,6) and Carte- 
sian coordinates (x, y) in the plane is 
given by x=rcos@ and y=rsin0, 
or r=J/x?+y? and 6 =atan(*). 
[BB82:A1.1.2] i 


polar harmonic transforms: Transfor- 
mations that can be used to generate 
rotation-invariant features. [PXK10] 
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polar rectification: A rectification algo- 
rithm designed to cope with any cam- 
era geometry in the context of uncal- 
ibrated vision, re-parameterizing the 
images in polar coordinates around the 
epipoles. [PKG99] 


polarization: The characterizing prop- 
erty of polarized light. [Hec87:Ch. 8] 


polarized light: Unpolarized light 
results from the nondeterministic 
superposition of the x and y com- 
ponents of the electric field. Other- 
wise, light is said to be polarized and 
the tip of the electric field evolves 
on an ellipse (elliptically polarized 
light). Light is often partially polar- 
ized, that is, it can be regarded as the 
sum of completely polarized and com- 
pletely unpolarized light. In computer 
vision, polarization analysis is an area of 
physics-based vision that has been used 
for metal-dielectric discrimination, 
surface reconstruction, fish classifica- 
tion, defect detection and structured 
light triangulation. [Hec87:Ch. 8] 


polarizer: A device changing the state 
of polarization of light to a specific 
polarized state, e.g., producing lin- 
early polarized light in a given plane. 
[Hec87:8.2] 


Polya distribution: Another name for 
the negative binomial distribution. 
[WP:Negative_binomial_distribution] 


polycurve: A simple curve C that is 
smooth everywhere but at a finite set 
of points and such that, given any point 
P on C, the tangent to C converges 
to a limit approaching P from each 
direction. Shape models in computer 
vision often describe boundary shapes 
using polycurve models consisting 
of a sequence of curved or straight 
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segments. See also polyline. The figure 
shows a polycurve with four circular 
arcs: [CF89] 


polygon: A closed, piecewise linear, 2D 
contour. Squares and pentagons are 
examples of regular polygons, where 
all sides have equal length and all 
angles formed by contiguous sides are 
equal. This does not hold for a general 
polygon. [WP:Polygon] 


polygon matching: A class of tech- 
niques for matching polygonal shapes. 
See polygon. [TVH05] 


polygonal approximation: A polyline 
approximating a curve. The figure 
shows a circular arc (badly) approxi- 
mated by a polyline: [BB82:8.2] 


polyhedron: A 3D object with planar 
faces, a “3D polygon”. A subset of 
R* whose boundary is a subset of 
finitely many planes. The basic prim- 
itive of many 3D modeling schemes, 
as many hardware accelerators process 
polygons particularly quickly. A tetra- 
hedron is the simplest polyhedron: 
[DH73:12.4] 


polyline: A piecewise linear contour. 
If closed, it becomes a polygon. See 
also polycurve, contour analysis and 
contour representation. [JKS95:6.4] 


pose: The location and orientation of 
an object in a given reference frame, 
especially a world or camera reference 
frame. A classic problem of computer 
vision is pose estimation. [SQ04:4.2.2] 


pose clustering: A class of algorithms 


solving the pose estimation prob- 
lem using clustering techniques. See 
also clustering, pose and k-means 
clustering. [FP03:18.3] 


pose consistency: An algorithm seeking 


to establish whether two shapes are 
equivalent. Given two sets of points 
G, and Go, e.g., the algorithm finds a 
sufficient number of point correspon- 
dences to determine a transformation 
T between the two sets, then applies 
T to all other points of G,. If the 
transformed points are close to points 
in G2, consistency is satisfied. Also 
known as viewpoint consistency. See 
also feature point correspondence. 
[FP03:18.2] 


pose determination: See pose 


estimation. 


pose estimation: The problem of deter- 


mining the orientation and translation 


positron 


effect may be noticeable only in the 
periphery of the image. See also lens. 
[SHB08:4.1.1] 


position invariant: Any property that 


does not vary with position. For 
instance, the length of a 3D line seg- 
ment is invariant to the line’s posi- 
tion in 3D space, but the length of 
the line’s projection on the image 
plane is not. See also invariant. 
[OAE95] 


emission tomography 
(PET): A medical imaging method 
that can measure the concentration 
and movement of a positron-emitting 
isotope in living tissue. [Jai89:10.1] 


postal code analysis: A set of image 


analysis techniques concerned with 
understanding written or printed 
postal codes. See handwritten charac- 
ter recognition and optical character 
recognition. [WP:Handwriting_ 


of an object, especially a 3D one, 
from one or more images thereof. 
Often the term means finding the 
transformation that aligns a geometric 
model with the image data. Several 
techniques exist for this purpose. See 
also alignment, model registration, 
orientation representation and 
rotation representation. [WP:Pose_ 
(computer_vision)#Pose_Estimation] 


pose representation: The problem of 


representing the angular position, or 
pose, of an object (especially a 3D 
object) in a given reference frame. A 
common representation is the rotation 
matrix, which can be parameterized 
in different ways, e.g., Euler angles, 
pitch, yaw and roll angles, rotation 
angles around the coordinate axes, 
axis-angle curves and quaternions. See 
also orientation representation and 
rotation representation. [MGC96] 


position: Location in space (either 2D or 
3D). [WP:Position_(vector)] 


position-dependent brightness cor- 


rection: A technique seeking to coun- 
teract the brightness variation caused 
by a real imaging system, typically the 
fact that brightness decreases as one 
moves away from the optical axis in a 
lens system with finite aperture. This 


recognition] 


posterior distribution: A conditional 


probability distribution that describes 
the situation after some data or evi- 
dence has been observed. Contrast 
with the prior distribution. In a 
Bayesian statistical model, the poste- 
rior distribution p(6|D, M) expresses 
the posterior uncertainty about the 
parameters 0 in model M after the 
data D has been observed. See also 
a posteriori probability and Bayesian 
statistical model. [Bis06:1.2.3] 


posterior probability: See posterior 


distribution. 


posture analysis: A class of techniques 


aiming to estimate the posture of an 
articulated body, such as a human 
body (e.g., pointing, sitting, standing, 
crouching etc.). [WP:Motion_analysis¥ 
Applications] 


potential field: A mathematical function 


that assigns some (usually scalar) value 
at every point in some space. In com- 
puter vision and robotics, this is usu- 
ally a measure of some scalar prop- 
erty at each point of a 2D or 3D space 
or image, such as the distance from a 
structure. The representation is used in 
path planning, such that the potential 
at every point indicates, e.g., the ease 
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Prewitt 


or difficulty of getting to some destina- 
tion. [DH73:5.11] 


potential function: A real function with 


continuous partial second derivatives 
that satisfies Laplace’s equation V? f = 
0; also known as a harmonic function. 
See also Laplacian. [Weil2:Harmonic 
Function] 


power spectrum: In the context of 


computer vision, normally the amount 
of energy at each spatial frequency. 
The term could also refer to the 
amount of energy at each light fre- 
quency. Also called the “power spec- 
trum density function” or “spectral 
density function”. [Jai89:11.5] 


precision: 1) The repeatability of the 


accuracy of a vision system (in gen- 
eral, of an instrument) over many mea- 
sures carried out in the same condi- 
tions. Typically measured by the stan- 
dard deviation of a target error mea- 
sure. For instance, the precision of 
a vision system measuring linear size 
would be assessed by taking thou- 
sands of measurements of a perfectly 
known object and computing the stan- 
dard deviation of the measurements. 
See also accuracy. 

2) The number of significant bits in a 
floating point or double precision num- 
ber that lie to the right of the decimal 
point. [WP:Accuracy_and_precision] 


predictive compression method: A 


class of image compression algorithms 
using redundancy information, mostly 
correlation, to build an estimate of a 
pixel value from values of neighboring 
pixels. [WP:Linear_predictive_coding] 


pre-processing: Operations on an 


image that, e.g., suppress some 
distortions or enhance some fea- 
tures. Examples include geometric 
transformations, edge detection, image 
restoration etc. There is no clear dis- 
tinction between image pre-processing 
and image processing. [WP:Data_Pre- 
processing] 


gradient operator: An 
edge detection operator based on 
template matching. It applies a set 
of convolution masks, or kernels 
(see Prewitt kernel), implementing 
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matched filters for edges at various 
(generally eight) orientations. The 
magnitude (or strength) of the edge 
at a given pixel is the maximum of 
the responses to the masks. Some 
implementations use the sum of the 
absolute value of the responses from 
the horizontal and vertical masks. 
[JKS95:5.2.3] 


Prewitt kernel: The horizontal and verti- 
cal masks used by the Prewitt gradient 
operator: [JKS95:5.2.3] 


Prim’s algorithm: An algorithm for find- 


ing the minimal spanning tree, in the 
sense of minimizing the weight of the 
edges, while still including every ver- 
tex. [Gib85:2.1.1] 


ptimal sketch: A representation for 


early vision, introduced by Marr, focus- 
ing on low-level features, such as 
edges. The full primal sketch groups 
the information computed in the raw 
primal sketch (consisting largely of 
edge, bar, end and blob feature infor- 
mation extracted from the images), 
e.g., by forming subjective contours. 
See also Marr’s theory, Marr-Hildreth 
edge detector and raw primal sketch. 
[Nev82:7.2] 


primary color: A color coding scheme 


whereby a range of perceivable colors 
can be made by a weighted combina- 
tion of primary colors. For example, 
color television and computer screens 
use light-emitting chemicals to pro- 
duce the three primary colors (red, 
green and blue). The ability to use 
only three colors to generate all oth- 
ers arises from the tri-chromacy of 
the human eye, which has cones that 
respond to three different color spec- 
tral ranges. See also additive color and 
subtractive color. [Hec87:4.4] 


principal 


principal component analysis (PCA): 


A statistical technique useful for reduc- 
ing the dimensionality of data, at the 
basis of many computer vision tech- 
niques (e.g., point distribution models 
and eigenspace-based recognition). In 
essence, the deviation of a random vec- 
tor, X, from the population mean, p, 
can be expressed as the product of 
A, the matrix of eigenvectors of the 
covariance matrix of the population, 
and a vector y of projection weights: 


y= AG — i) 


so that 


Usually only a subset of the compo- 
nents of y is sufficient to approxi- 
mate x. The elements of this sub- 
set correspond to the largest eigen- 
values of the covariance matrix. See 
also Karhunen-Loéve transformation. 
[FP03:22.3.1] 


principal component basis space: 
In principal component analysis, the 
space generated by the basis formed 
by the eigenvectors, or eigendi- 
rections, of the covariance matrix. 
[WP:Principal_component_analysis] 


component represen- 
tation: See principal component 


analysis. 


principal curvature: The maximum or 


minimum normal curvature at a sur- 
face point, achieved along a principal 
direction. The two principal curva- 
tures and directions together com- 
pletely specify the local surface shape. 
In the figure, the principal curvatures 
in the two directions at point X on 
the cylinder of radius r are 0 (along 
the axis) and 1 (across the axis): 
[JKS95:13.3.2] 


__ PRINCIPAL 
= DIRECTIONS 


principal curvature sign class: See 


mean and Gaussian curvature shape 
classification. 


principal direction: The direction in 


which the normal curvature achieves 
an extremum, that is, a principal 
curvature. The two principal curva- 
tures and directions, together, specify 
completely the local surface shape. In 
the figure, the principal directions at 
point X on the cylinder are parallel 
to the axis and around the cylinder: 
[FP03:19.1.2] 


_. PRINCIPAL 
` DIRECTIONS 


principal point: The point at which 


the optical axis of a pinhole camera 
model intersects the image plane: 
[JKS95:12.9] 


PINHOLE 
PRINCIPAL | | i“: OPTICAL 
POINT >j T Z 7 ~ AXIS 
IMAGE SCENE 
PLANE 
OBJECT 


principal texture direction: An algo- 


rithm identifying the direction of a 
texture. A directional or oriented 
texture in a small image patch gener- 
ates a peak in the Fourier transform. 
To determine the direction, the Fourier 
amplitude plot is regarded as a distribu- 
tion of physical mass and the minimum- 
inertia axis is identified. [JS05] 


prior distribution: A probability distri- 


bution that encodes an agent’s beliefs 
about some uncertain quantity before 
some evidence or data is taken into 
account. For example in a Bayesian 
statistical model, the prior distribu- 
tion p(0|M) expresses the prior uncer- 
tainty about the parameters 0 in model 
M. See also Bayesian statistical model. 
[BisOG6:1.2.3] 
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probabilistic 


prior domain knowledge: Information 
about the scenario or class of objects 
known in advance. See also a priori 
probability. 


privileged viewpoint: A viewpoint 
where small motions cause image fea- 
tures to appear or disappear. This 
contrasts with a generic viewpoint. 
[Cow83] 


probabilistic causal model: A repre- 
sentation used in artificial intelligence 
for causal models. The simplest causal 
model is a causal graph, in essence 
an acyclic graph in which nodes rep- 
resent variables and directed arcs rep- 
resent cause and effect. A probabilis- 
tic causal model is a causal graph 
with the probability distribution of 
each variable conditional to its causes. 
See also probabilistic graphical model. 
[PR87] 


probabilistic data association: A form 


of data association for the tracking 
problem where all of the candidate 
observations for association to a track 
are combined in a single weighted 
sum. [FP03:17.4] 


distribution: See 
probability distribution. 


probabilistic graphical model: A 


graphical model that defines a joint 
probability distribution over a set 
of random variables. The graph- 
ical structure encodes conditional 
independence relationships. There are 
two main types of probabilistic graphi- 
cal model in common use, the directed 
graphical model (also known as a 
Bayesian network) and the undirected 
graphical model (also known as a 
Markov random field). [Bis06:Ch. 8] 


probabilistic Hough transform: Com- 


putes an approximation to the Hough 
transform by using only a percentage of 
the image data. The goal is to reduce 
the computational cost of the standard 
Hough transform. A threshold effect 
has been observed: if the percentage 
sampled is above the threshold level 
then few false positives are detected. 
[WP:Randomized_Hough_Transform] 


probabilistic inference: Given a joint 
probability distribution over a set of 
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random variables and knowledge of 
the values of some set of variables 
e (the evidence), probabilistic infer- 
ence is concerned with the conditional 
distribution of (a subset of) the remain- 
ing variables x, i.e., p(x|e). If x has 
large cardinality then it will make sense 
to focus on summaries of p(x|e); com- 
mon choices are the posterior marginal 
distribution of each variable in the 
set x, or the maximum a posteriori 
probability configuration of p(x|e). In 
a probabilistic graphical model, such 
summaries can often be computed viaa 
message-passing process such as belief 
propagation. [Mur12:6.5] 


probabilistic latent semantic analysis 


(PLSA): Usually described in relation 
to a set of documents and the words 
they contain. For computer vision, this 
could be translated to a set of images, 
each of which is described by a bag 
of features. PLSA models the distribu- 
tion of words in a document in terms 
of the proportions of a smaller num- 
ber of topics. PLSA is related to latent 
Dirichlet allocation but uses a point 
estimate over the proportions for each 
document rather than a Dirichlet prior. 
Simply applying principal component 
analysis to the words x documents 
matrix is known as latent semantic 
analysis; PLSA provides a better model 
for such data, taking into account con- 
straints such as the non-negativity of 
word counts. See also topic model. 
(Mur12:24.2] 


probabilistic model learning: The pro- 


cess of parameter estimation in a 
statistical model or model selection 
over a number of models. See also 
statistical learning. [BisOG:pp. 1-4] 


probabilistic principal component 


analysis: A technique defining a prob- 
ability model for principal component 
analysis (PCA). The original data is 
modeled as being generated by the 
reduced-dimensionality subspace typ- 
ical of PCA plus Gaussian noise. The 
model can be extended to a mixture 
model, trained using the expectation 
maximization (EM) algorithm. Proba- 
bilistic PCA is a special case of factor 
analysis. [Bis06:12.2] 


probabilistic relaxation: A method 
of data interpretation in which local 
inconsistencies act as inhibitors and 
local consistencies act as excitors. The 
hope is that the combination of these 
two influences constrains the probabil- 
ities. [CKP95] 


probabilistic relaxation labeling: An 
extension of relaxation labeling in 
which each entity to be labeled, e.g., 
each image feature, is not simply 
assigned to a label, but to a set of proba- 
bilities, each giving the probability that 
the feature could be assigned a spe- 
cific label. See also belief propagation. 
[BMO2:2.9] 


probability: A measure of the confi- 
dence one may have in the occurrence 
of an event, on a scale from 0 Gimpos- 
sible) to 1 (certain). The interpretation 
of probability is a subject of intense 
debate; we focus on two prominent 
theories. The frequentist interpreta- 
tion defines probabilities as limiting 
proportions in an infinite ensemble of 
experiments. For instance, the proba- 
bility of getting any number froma dice 
in a single throw is }. The subjectivist 
view regards probability as a subjec- 
tive degree of belief, which need not 
involve random variables; e.g., one can 
hold a degree of belief in the statement 
“Shakespeare’s plays were written by 
Francis Bacon”. Degrees of belief can 
be mapped onto the rules of prob- 
ability if they satisfy certain consis- 
tency rules known as the Cox axioms. 
[Mac03:2.2]. 


probability density: See probability 
density function. 


probability density estimation: A class 
of techniques for estimating the den- 
sity function or its parameters given a 
sample from a population. A related 
problem is testing whether a partic- 
ular sample has been generated by 
a process characterized by a particu- 
lar probability distribution. Two com- 
mon tests are the goodness-of-fit test 
and the Kolmogorov-Smirnov test. 
The former is a parametric test best 
used with large samples; the latter 
gives good results with smaller sam- 
ples, but is a non-parametric test and, 


as such, does not produce estimates 
of the population parameters. See also 
non-parametric method. [Nal93:A2.2] 


probability density function: For 
a continuous random _ variable 
X, the probability density func- 
tion p(x) is the derivative of the 
cumulative distribution function 
F(x) so that p(x) = dF()/dx, and 
F@= gees p(x) dx. If Ax is small, 
then Prax < X < x+ Ax) = pHAx. 
This definition can be extended 
to vector-valued random variables. 
[Bis06:1.2.1] 


probability distribution: A normalized 
statistical distribution described with 
probabilistic likelihoods for the occur- 
rence of each possible outcome for a 
given variable over a number of trials. 
See also expectation value. 


probe data: Data acquired by remote 
sensing or parameters (such as loca- 
tion) describing remote sensing. 


probe image: 1) An image acquired by 
remote sensing, such as by an endo- 
scope. Such images usually exhibit 
high levels of geometric distortion. 
2) An image used to interrogate an 
image dataset. [SKI+00] 


probe pattern: A pattern of projected 
light from a probe that can be used to 
assess depth from defocus. 


probe set: A set of data used to search a 
dataset for a match. [PMRROO] 


probe video: 1) A video taken using 
remote sensing. See probe image. 
2) A video to be used for the retrieval of 
some object or video within a known 
dataset. [ZKC03] 


procedural representation: A class of 
representations used in artificial intel- 
ligence to encode how to perform a 
task (procedural knowledge). A classic 
example is the production system. In 
contrast, declarative representations 
encode how an entity is structured. 
[Sch89:Ch. 7] 


Procrustes analysis: A method for 
comparing two data sets through 
the minimization of squared errors, 
by translation, rotation and scaling. 
[WP:Procrustes_analysis] 
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Procrustes average: Procrustes analysis 


aligns a pair of shapes and defines 


a Procrustes distance between the 
two shapes. Given a set of shapes, 
one can find a reference shape that 
minimizes the Procrustes distance 
between each shape and the refer- 
ence shape. This reference shape is the 
Procrustes average. [WP:Generalized_ 
Procrustes_analysis] 


production system: 1) An approach 
to computerized logical reasoning, 
whereby the logic is represented as 
a set of “production rules”. A rule is 
of the form “LHS—RHS”. This states 
that if the pattern or set of conditions 
encoded in the left-hand side (LHS) are 
true or hold, then do the actions speci- 
fied in the right-hand side (RHS), which 
may simply be the assertion of some 
conclusion. A sample rule might be 
“If the number of detected edge frag- 
ments is less than 10, then decrease the 
threshold by 10%”. 
2) An industrial system that manufac- 
tures some product. 
3) A system that is to be used, as 
compared to a demonstration system. 
[Sch89:Ch. 7] 


profiles: A shape signature for image 
regions, specifying the number of pix- 
els in each column (vertical profile) 
or row (horizontal profile). Used in 
pattern recognition. See also shape and 
shape representation. [SOS00:4.9.2] 


progressive image transmission: A 
method of transmitting an image in 
which a low-resolution version is first 
transmitted, followed by details that 
allow progressively higher resolution 
versions to be recreated: [SGL92] 


BETTER 
IMAGE 


FIRST 
IMAGE 


BEST 
IMAGE 


progressive scan camera: A camera 
that transfers an entire image in the 
order of left-to-right, top-to-bottom, 
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without the alternate line interlaced 
scanning used in television standards. 
This is much more convenient for 
machine vision and other computer- 
based applications. [WP:Digital_ 
video#Technical_overview] 


projection: 1) The transformation of a 

geometric structure from one space to 
another, e.g., the projection of a 3D 
point onto the nearest point in a given 
plane. The projection may be specified 
by a linear function, i.e., for all points 
P in the initial structure, the points p' 
in the projected structure are given by 
p' = MP for some matrix M. Alterna- 
tively, the projection need not be lin- 
ear, €g., P = JÐ). 
2) The specific case of projection of a 
scene that creates an image on a plane 
by use of, e.g., a perspective camera, 
according to the rules of perspective. 
[Nal93:2.1] 


projection matrix: The matrix trans- 
forming the homogeneous projective 
coordinates of a 3D scene point 
(x, y, z, 1) into the pixel coordinates 
(u,v, 1) of the points image in a 
pinhole camera. It can be factored 
as the product of the two matri- 
ces of the intrinsic parameters and 
extrinsic parameters. See also camera 
coordinates, image coordinates and 
scene coordinates. [FP03:2.2-2.3] 


projective geometry: A field of geome- 
try dealing with projective spaces and 
their properties. A projective geome- 
try is one where only properties pre- 
served by projective transformations 
are defined. Projective geometry pro- 
vides a convenient and elegant the- 
ory to model the geometry of the 
common perspective camera. Most 
notably, the perspective projection 
equations become linear. [FP03:13.1] 

projective invariant: A property, 
say I, that is not affected by a 
projective transformation. More specif- 
ically, assume an invariant, I (P), of 
a geometric structure described by a 
parameter vector P. When the struc- 
ture is subject to a projective trans- 
formation (M) this gives a structure 
with parameter vector p, and I(P) = 


I(p). The most fundamental projec- 
tive invariant is the cross ratio. In 
some applications, invariants of weight 
w occur, which transform as I(p) = 
I(P)(det M”. [TV98:10.3.2] 


projective plane: A plane, usually 
denoted by P?, on which a projective 
geometry is defined. [TV98:A.4] 


projective reconstruction: The prob- 
lem of reconstructing the geometry 
of a scene from a set or sequence of 
images in a projective space. The trans- 
formation from projective to Euclidean 
coordinates is easy if the Euclidean 
coordinates of the five points in a 
projective basis are known. See also 
projective geometry and projective 
stereo vision. [WP:Fundamental_ 


matrix_(computer_vision)#Projective_ 


Reconstruction_Theorem] 


projective space: A space of (n+ 1)- 
dimensional vectors, usually denoted 
by P”, on which a projective geometry 
is defined. [FP03:13.1.1] 


projective stereo vision: A class of 
stereo algorithms based on projective 
geometry. Key concepts expressed ele- 
gantly by the projective framework 
are epipolar geometry, fundamental 
matrix and projective reconstruction. 
[LZ99] 


projective stratum: A layer in the 
stratification of 3D geometries. Moving 
from the simplest to the most com- 
plex, we have the projective, affine, 
metric and Euclidean strata. See also 
projective geometry and projective 
reconstruction. [Fau95] 


projective transformation: Also 
known as “projectivity”, from one 
projective plane to another. It can 
be represented by a non-singular 
3 x 3 matrix acting on homogeneous 
coordinates. The transformation has 
eight degrees of freedom, as only 
the ratio of projective coordinates is 
significant. [FP03:2.1.2] 


property-based matching: The pro- 
cess of comparing two entities (e.g., 
image features or patterns) using 
their properties, e.g., the moments 
of a region. See also classification, 


boundary property and metric 


property. [RC96] 


property learning: A class of algorithms 
aiming at learning and characterizing 
attributes of spatio-temporal patterns. 
For example, learning the color and 
texture distributions that differentiate 
beween normal and cancerous cells. 
See also boundary property, metric 
property, unsupervised learning and 
supervised learning. [WP:Supervised_ 
learning] 


prototype: An object or model serving 
as a representative example for a class, 
capturing the defining characteristics 
of the class. [WP:Prototype] 


proximity matrix: A matrix M occur- 
ring in cluster analysis. MÇ, j) denotes 
the distance (e.g., the Hamming 
distance) between e.g., clusters 7 and 
J. [SL90] 


pseudocolor: A way of assigning a color 
to pixels that is based on an interpre- 
tation of the data rather than the orig- 
inal scene color. The usual purpose of 
pseudocoloring is to label image pix- 
els in a useful manner. For example, 
one common pseudocoloring assigns 
different colors according to the local 
surface shape class. A pseudocoloring 
scheme for aerial or satellite images of 
the earth assigns colors according to 
the land type, such as water, forest, 
wheat field etc. [JKS95:7.7] 


PSF: See point spread function. 


psychophysics: A field of study that 
considers the relationship between 
human (or other organism) percep- 
tions and the physical stimuli which 
cause them. [Pal99:4.2] 


PTZ: See pan-tilt-zoom. 


purposive perception: Perception 
which is motivated directly by the 
purpose of the agent. For example 
a robot may need to search for a 
particular landmark to accomplish 
self-localization. [BC92] 


purposive vision: An area of computer 
vision linking perception with purpo- 
sive action; that is, modifying the posi- 
tion or parameters of an imaging sys- 
tem purposively, so that a visual task 
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is facilitated or made possible. Exam- 
ples include changing the lens parame- 
ters to obtain information about depth, 
as in depth from defocus, or moving 
around an object to achieve full shape 
information. [Alo90] 


pyramid: A representation of an image 
including information at several spatial 
scales. The pyramid is constructed by 
the original image (maximum resolu- 
tion) and a scale operator that reduces 
the content of the image (e.g., a Gaus- 
sian filter) by discarding details at 


coarser scales: 
64x64 i 
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Applying the operator and subsam- 
pling the resulting image leads to the 
next (lower-resolution) level of the 
pyramid. See also scale space, image 
pyramid, Gaussian pyramid, Laplacian 
pyramid and pyramid transform. 
[JKS95:3.3.2] 


pytamid architecture: A computer 


architecture supporting pyramid-based 
processing, typically occurring in 
the context of multi-scale process- 
ing. See also scale space, pyramid, 
image pyramid, Laplacian pyramid and 
Gaussian pyramid. [JKS95:3.3.2] 


pytamid transform: An operator for 


building a pyramid from an image. See 
pyramid, image pyramid, Laplacian 
pyramid and Gaussian pyramid. 
[JKS95:3.3.2] 


pytamid vector quantization: A 


fast method of implementing vector 
quantization based on a set of lattice 
points that fall on a hyperpyramid. 
[GG92:p. 465] 


QBIC: See query by image content. 
[WP:Content-based_image_retrieval# 
Other_query_methods] 


quadratic form: A quadratic form is 
a homogeneous polynomial of degree 
two in a number of variables. Given 
a vector of d real variables x= 
(%1,...,Xa)', a dxd real matrix A 
defines a quadratic form in x as QŒ = 
x" Ax. [Nob69:12.2] 


quadratic variation: 1) Any function 
(here, expressing a variation of some 
variables) that can be modeled by a 
quadratic polynomial. 
2) The specific measure of surface 
shape deformation fey + 2/2, + f;, of 
a surface f(x,y). This measure has 
been used to constrain the smoothness 
of reconstructed surfaces. [Hor86:8.2] 


quadrature mirror filter: A class of fil- 
ters occurring in wavelet and image 
compression filtering theory. The fil- 
ter splits a signal into a high pass 
component and a low pass compo- 
nent, with the low pass component’s 
transfer function a mirror image of 
that of the high pass component. 
[WP:Quadrature_mirror_filter] 


quadric: A surface defined by a second- 
order polynomial. See also conic. 
[FP03:2.1.1] 


quadric patch: A quadric surface 
defined over a finite region of the 


independent variables or parameters; 
e.g., in range image analysis, a part of 
a range surface that is well approxi- 
mated by a quadric (e.g., an elliptical 
patch). [WP:Quadric] 


quadric patch extraction: A class of 


algorithms aiming to identify the por- 
tions of a surface that are well approxi- 
mated by quadric patches. Techniques 
are similar to those applied for conic 
fitting. See also surface fitting and least 
square surface fitting. [FFE97] 


quadrifocal tensor: An algebraic con- 


straint imposed on quadruples of cor- 
responding points by the geometry of 
four simultaneous views, analogous to 
the epipolar constraint for the two- 
camera case and to the trifocal tensor 
for the three-camera case. See also 
stereo correspondence and epipolar 
geometry. [FP03:10.3] 


quadrilinear constraint: The geomet- 


ric constraint on four views of a point 
G.e., the intersection of four epipolar 
lines). See also epipolar constraint and 
trilinear constraint. [FP03:10.3] 


quadtree: A hierarchical structure rep- 


resenting 2D image regions, in which 
each node represents a region, and 
the whole image is the root of the 
tree. Each non-leaf node, representing 
a region R, has four children, that rep- 
resent the four subregions into which 
R is divided: 
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Hierarchical subdivision continues 
until the remaining regions have con- 
stant properties. Quadtrees can be 
used to create a compressed image 
structure. The 3D extension of a 
quadtree is the octree. [SQ04:5.9.1] 


qualitative vision: A paradigm based on 


the idea that many perceptual tasks 
could be better accomplished by com- 
puting only qualitative descriptions 
of objects and scenes from images, 
as opposed to quantitative informa- 
tion, such as accurate measurements. 
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Suggested in the framework of com- 
putational theories of human vision. 
[Nal93:10] 


quantization: See spatial quantization. 


quantization error: The approximation 
error created by the quantization of 
a continuous variable, typically using 
a regularly spaced scale of values. 
The figure shows a continuous func- 
tion (dashed) and its quantized version 
(solid line) using six values only: 


The quantization error is the vertical 
distance between the two curves. For 
instance, the intensity values in a dig- 
ital image can only take on a certain 
number of discrete values (often 256). 
See also sampling theorem and Nyquist 
sampling rate. [SQ04:4.2.1] 


quantization noise: See quantization 
error. 


quasi-affine reconstruction: A 
projective reconstruction with 
an additional constraint that the 
plane at infinity is not split. This 
is an intermediate level between 
projective reconstruction and affine 
reconstruction. [NisO1] 


quasi-invariant: An approximation of 
an invariant. For instance, quasi- 
invariant parameterizations of image 
curves have been built by approxi- 
mating the invariant arc length with 


lower spatial derivatives. [WP:Quasi- 
invariant_measure] 


quaternion: A forerunner of the modern 
vector concept, invented by Hamilton, 
used in vision to represent rotations. 
Any 3D-rotation matrix, R, can be 
parameterized by a vector of four 
numbers, q = (Qo, qı, q2, g) where 
yi») q2 = 1, that uniquely define the 


rotation. A rotation has two represen- 
tations, g and —q. See rotation matrix 
for alternative representations of rota- 
tions. [FP03:21.3.1] 


query by image content (QBIC): A 
class of techniques for selecting mem- 
bers from a database of images by using 
examples of the desired image content 
(as opposed to textual search). Exam- 
ples of contents include color, shape, 
and texture. See also image database 
indexing. [WP:Content-based_image_ 
retrieval#Other_query_methods] 
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R-S curve: A contour representation giv- 
ing the distance, 7, of each point of the 
contour from an origin, chosen arbi- 
trarily, as a function of the arc length, 
s. Allows rotation-invariant compari- 
son of contour. See also contour, shape 
representation: [SHLW89] 


radar: An active sensor detecting the 
presence of distant objects. A nar- 
row beam of very high-frequency radio 
pulses is transmitted and reflected by 
a target back to the transmitter. The 
direction of the reflected beam and the 
time of flight of the pulse determine 
the targets position. See also time- 
of-flight range sensor. [TV98:2.5.2] 


radial basis function (RBF): A function 


whose value depends only on the radial 
distance of the input x from some cen- 
ter €, i.e., pœ = Cx — clI). Typically 
Euclidean distance is used, and a very 
common choice for the radial function 
is or) = e br (for some positive con- 
stant 6). [Bis06:6.3] 


radial basis function network: A 


neural network composed of radial 
basis function units. It is used for 
supervised learning problems and 
maps input data x to a space of outputs 
J. The standard RBF network architec- 
ture has a hidden layer of RBF units 
connected to the input x; the num- 
ber of units in this layer is arbitrary. 
The outputs of these hidden units are 
then connected to the output layer 


by a matrix of parameters (weights), 
optionally with an additional nonlin- 
earity. One common way of specifying 
the centers in the RBF network is to 
randomly select a subset of the data 
points. [Bis06:6.3] 


radial lens distortion: A type of 


geometric distortion introduced by 
a real lens. The effect is to shift 
the position of each image point, 
p, away from its true position along 
the line through the image center 
and p. See also lens, lens distortion, 
barrel distortion, tangential distortion, 
pincushion distortion and distortion 
coefficient. This figure shows the typi- 
cal deformations of a square (exagger- 
ated): [FP03:3.3] 


radiance: The amount of light (radiat- 


ing energy) leaving a surface. The light 
can be generated by the surface itself, 
as in a light source, or reflected by it. 
The surface can be real (e.g., a wall) 
or imaginary (e.g., an infinite plane). 
See also irradiance and radiometry. 
[FP03:4.1.3] 


radiance map: A map of radiance for 


a scene. Sometimes used to refer 
to a high dynamic range image. 
[FP03:4.1.3] 


radiant flux: The radiant energy per 


time unit; that is, the amount of 
energy transmitted or absorbed per 
time unit. See also radiance, irradiance 
and radiometry. [Hec87:3.3.1] 
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radiant intensity: See radiant flux. 


radiation: 1) Any form of emission in the 


electromagnetic spectrum. 

2) Emissions given off by radioactive 
materials, which could include elec- 
tromagnetic emissions (i.e., gamma ray 
photons). [WP:Radiation] 


radiometric calibration: A process 


seeking to estimate radiance from 
pixel values. The rationale for radio- 
metric calibration is that the light 
entering a real camera (the radiance) 
is, in general, altered by the camera 
itself. A simple calibration model is 
EG, j) = gG, PI + 0G, j), where, for 
each pixel Œ, j), E is the radiance to 
estimate, J is the measured intensity, 
and g and o are a pixel-specific gain 
and offset to be calibrated. Ground 
truth values for E can be measured 
using a photometer (see photometry). 
[WP:Radiometric_calibration] 


radiometric response function: A 


function that defines how recorded 
irradiance values are transformed into 
intensity values within an imaging 
device. [LZ05] 


radiometry: The measurement of 


optical radiation, i.e., electromag- 
netic radiation between 3 x 10!! and 
3x10! Hz (wavelengths between 
0.01 and 1000um). This includes 
ultraviolet, visible and infrared radia- 
tion. Common units encountered are 
vays and 2er Compare with 
sec—steradian` 

photometry. which is the measure- 
ment of visible light. [FP03:4.1] 


radius vector function: A contour or 


boundary representation based about a 
point c in the center of the figure (usu- 
ally the center of gravity or a physically 
meaningful point). The representation 
then records the distance r(@) from ¢ 


r(0) ry 


Ot 


to points on the boundary, as a func- 
tion of 6, which is the angle between 
the direction and some reference direc- 
tion. The representation has problems 
when the vector at angle 0 intersects 
the boundary more than once: [Kin03] 


Radon transform: A transformation 


mapping an image into a parameter 
space highlighting the presence of 
lines. It can be regarded as an exten- 
sion of the Hough transform. One def- 
inition is 


8.0) = 
I I(x, y)6(e—x cos 0 — y sin —dxdy 


where I(x, y) is the image (gray val- 
ues) and p = xcos@ + ysiné is a para- 
metric line in the image. Lines are 
identified by peaks in the p,@ space. 
See also Hough transform line finder. 
[Jai89:10.2] 


RAG: See region adjacency graph. 


random access camera: A camera char- 


acterized by the possibility of access- 
ing any image location directly, unlike 
a sequential scan camera, in which 
image values are transmitted in a stan- 
dard order. [GAHK91] 


random dot stereogram: A stereo pair 


formed by one random dot image (that 
is, a binary image in which each pixel is 
assigned to black or white at random) 
and a second image that is derived from 
the first. The figure shows an example 
in which a central square is shifted hor- 
izontally: 


Looking cross-eyed at close distance, 
you should perceive a strong 3D effect. 
See also stereo and stereo vision. 
[Nal93:7.1] 


random forest: An example of 


ensemble learning where a classifier 
is made up of many decision trees. 
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Diversity between the decision trees 
is obtained by training on random 
subsets of the training set or by the 
random selection of features. The 
overall classification result for a given 
example is obtained by bagging over 
the set of learned decision trees. 
[HTF08:Ch. 15] 


random process: See stochastic 
process. 
random sample consensus: See 
RANSAC. 


random variable: A scalar or a vec- 
tor variable that takes on a random 
value. The set of possible values may be 
describable by a standard distribution, 
such as Gaussian distribution, Gaussian 
mixture model, uniform distribution or 
Poisson distribution. [Nal93:A2.2] 


randomized Hough transform: A vari- 
ation of the standard Hough transform 
designed to produce higher accuracy 
with less computational effort. The 
line-finding variant of the algorithm 
selects pairs of image edge points ran- 
domly and increments the accumu- 
lator cell corresponding to the line 
through these two points. The selec- 
tion process is repeated a fixed number 
of times. [WP:Randomized_Hough_ 
Transform] 


range compression: Reducing the 
dynamic range of an image to enhance 
the appearance of the image. This is 
often needed for images resulting from 
the magnitude of the Fourier transform 


which might have pixels with both 


large and very low values. Without 
range compression, it is hard to see 
the structure in the pixels with low val- 
ues. In the figure, you can see (left) the 
magnitude of a 2D Fourier transform 
with a single bright spot in the middle 
and (right) the logarithm of that image, 
revealing more details: [Jai89:7.2] 
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range data: A representation of the spa- 
tial distribution of a set of 3D points. 
The data is often acquired by stereo 
vision or by a range sensor. In com- 
puter vision, range data are often rep- 
resented as a cloud of points, i.e., a 
set of triplets representing the X, Y, Z 
coordinate of each point, or as range 
images, also known as moiré patches. 
The figure shows a range image of an 
industrial part, where brighter pixels 
are closer: [TV98:2.5] 


range data fusion: The merging of mul- 
tiple sets of range data, especially for 
the purpose of extending the por- 
tion of an object’s surface described 
by the range data, or increasing the 
accuracy of measurements by exploit- 
ing the redundancy of multiple mea- 
sures available for each point of the sur- 
face area. See also information fusion, 
fusion, sensor fusion. [Haa94] 


range data integration: See range data 
fusion. 


range data 
registration. 


registration: See 


range data segmentation: A class of 
techniques partitioning range data into 
a set of regions. For instance, HK 
segmentation produces a set of sur- 
face patches. In the figure, the plane, 
cylinder and spherical patches on the 
right have been extracted from the 
range image on the left. See also surface 
segmentation: [Nev82:9.3] 
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range edge: See 
discontinuity. 


surface 


shape 


range flow: A class of algorithms for 
the measurement of motion in time- 
varying range data, made possible by 
the evolution of fast range sensors. See 
also optical flow. [YBBR93] 


range image: A representation of range 
data as an image. The pixel coordinates 
are related to the spatial position of 
each point on the range surface and the 
pixel value represents the distance of 
the surface point from the sensor (or 
from an arbitrary, fixed background). 
The figure shows a range image of a 
face, in which darker pixels are closer: 
[JKS95:11.4] 


range image edge detector: An edge 
detector working on range images. 
Typically, edges occur where depths 
or surface normal directions (fold 
edges) change rapidly. See also edge 
detection and range images. In the fig- 
ure, the depth and fold edges on the 
right have been extracted from the 


range image on the left: [JB99] 
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range sensor: Any sensor acquiring 
range data. The most popular range 
sensors in computer vision are based 
on optical and acoustic technolo- 
gies. A laser range sensor often uses 
structured light triangulation. A time- 
of-flight range sensor measures the 
round-trip time of an acoustic or opti- 
cal pulse. See also depth estimation. 
The figure shows a triangulation range 
sensor: [TV98:2.5.2] 
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rank-based gradient estimation: Esti- 
mation of the gradient value (rate 
of change) in the presence of noise 
through the use of an ordered list of 
local data points. 


rank order filtering: A class of filters 
the output of which depends on an 
ordering (ranking) of the pixels within 
the region of support. The classic 
example is the median filter which 
selects the middle value of the set of 
input values. More generally, the filter 
selects the kth largest value in the input 
set. [SB11:4.4.3] [AFP91] 


RankBoost: A pairwise method for 
addressing the ranking problem using 
the boosting method. [FISS03] 


ranking function: Any function that 
solves the ranking problem. [SC04:8.1] 


ranking problem: The problem of 
determining the order (or rank) of a 
number of items. [SC04:8.1] 


rank support vector machine (SVM): 
A pairwise method for addressing 
the ranking problem using a support 
vector machine. See also support 
vector ranking. [SC04:8.1] 


RANSAC: Acronym from random sam- 
ple consensus, a robust estimator seek- 
ing to counter the effect of outliers in 
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data used, e.g., in a least square esti- 
mation problem. In essence, RANSAC 
considers a number of data subsets of 
the minimum size necessary to solve 
the problem (e.g., a parametric surface 
fit), then looks for statistical agreement 
of the results. See also least median of 
squares estimation, M-estimation and 
outlier rejection. [FP03:15.5.2] 


raster scan: A “raster” is the region of a 


monitor, e.g., a cathode ray tube (CRT) 
or a liquid crystal display (LCD), capa- 
ble of rendering images. In a CRT, the 
raster is a sequence of horizontal lines 
that are scanned rapidly with an elec- 
tron beam from left to right and top 
to bottom, largely in the same way 
as a TV picture tube is scanned. In 
an LCD, the raster (usually called a 
“grid”) covers the whole device area 
and image elements are displayed indi- 
vidually. [Low91:4.2] 


rate distortion: A statistical method use- 


ful in analog-to-digital conversion. It 
determines the minimum number of 
bits required to encode data while tol- 
erating a given level of distortion or the 
amount of distortion created by using 
a given number of bits. [Jai89:2.13] 


rate-distortion function: The number 


of bits per sample (the rate R4) to 
encode an analog image (or other sig- 
nal) value given the allowable dis- 
tortion D (or mean square of the 
error). Also needed is the variance o? 
of the input value (assuming it is a 
Gaussian random variable). Then R4 = 


max(0, Hogs(%)). Jai89:2.13] 


real-time processing: Any computa- 


tion performed within the time lim- 
its imposed by a given process. For 
example, in visual servoing a tracking 
system feeds positional data to a con- 
trol algorithm generating control sig- 
nals; if the control signals are gener- 
ated too slowly, the whole system may 
become unstable. Different processes 
can impose very different constraints 
for real-time processing. When pro- 
cessing video-stream data, “real time” 
means complete processing of one 
frame of data in the time before 
the next frame is acquired (possibly 
with several frames lag time, as in 
pipeline parallelism). [WP:Real-time_ 
computing] 


receiver operating curve (ROC): A 


diagram showing the performance 
of a classifier. It plots the number or 
percentage of true positives against 
the number or percentage of false 
positives as one varies some parameter 
of the classifier. See also performance 
characterization, test set and 
classification. [FP03:22.2.1] 


receptive field: 1) The retinal area gen- 


erating the response to a photostim- 
ulus. The main cells responsible for 
visual perception in the retina are the 
rods and the cones, active in high- and 
low-intensity situations respectively. 
2) The region of visual space giving rise 
to photopic response. 

3) The region of an image that is input 
to the calculation of each output value. 
(See region of support.) [FP03:1.3] 


k P 2 is recognition: See identification. 
rate invariant action recognition: So 


The recognition of actions regard- recognition by components (RBC): 1) 


less of the speed at which they are 
performed. See action recognition. 
[VSRCO9] 


raw primal sketch: The first represen- 


tation built in the perception process 
according to Marr’s theory of vision, 
heavily based on detection of local 
edge features. It represents the loca- 
tion, orientation, contrast and scale of 
center-surround, edge, bar and trun- 
cated bar features. See also primal 
sketch. [MH80] 


RBC: See recognition by components. 
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A theory of human image understand- 
ing devised by Biederman. The foun- 
dation is a set of 3D shape primitives 
called geons, reminiscent of Marr’s 
generalized cones. Different combina- 
tions of geons yield a large variety 
of 3D shapes, including articulated 
objects. 

2) The recognition of a complex object 
by recognizing subcomponents and 
combining them to recognize more 
complex objects. See also hierarchical 
matching, shape representation, 
model-based recognition and object 


recognition. [WP:Recognition-by- 
components_theory] 

recognition by parts: See recognition 
by components. [WP:Object_recogni- 
tion_(computer_vision)] 


recognition by structural decomposi- 
tion: See recognition by components. 


reconstruction: The problem of com- 
puting the shape of a 3D object or sur- 
face from one or more intensity sensors 
or range images. Typical techniques 
include model acquisition and the 
many shape from X methods reported 
(see shape from contour and following 
entries). [TV98:7.4] 


reconstruction error: Inaccuracies in 
a model when compared to reality, 
caused by inaccurate sensing or com- 
pression. See lossy compression. 
[WP:3D_reconstruction_from_ 
multiple_images#Algebraic_vs_ 
geometric_error] 


rectification: A technique warping two 
images into some form of geomet- 
ric alignment, e.g., so that the verti- 
cal pixel coordinates of corresponding 
points are equal. See also stereo image 
rectification. The figure shows a stereo 
pair (top) and its rectified version (bot- 
tom), highlighting some of the scan- 
lines where corresponding image fea- 
tures lie: [JKS95:12.5] 


recursive region growing: A class 
of recursive algorithms for region 
growing. An initial pixel is chosen and 
its neighboring pixels are explored 
(the neighbors are determined by an 
adjacency rule, e.g., 8-adjacency). If 


any pixel meets the criteria for addition 
to the region, the growing procedure 
is called recursively on that pixel. The 
process continues until all connected 
image pixels have been examined. See 
also adjacency, image connectedness, 
neighborhood and recursive splitting. 
[SQ04:8.3.1] 


recursive splitting: A class of recur- 
sive algorithms for region segmenta- 
tion (dividing an image into a region 
set). The region set is initialized to 
the whole image. A homogeneity cri- 
terion is then applied; if not satis- 
fied, the image is split according to 
a given scheme (e.g., into four sub- 
images, as in a quadtree), leading to 
a new region set. The procedure is 
applied recursively to all regions in 
the new region set, until all remain- 
ing regions are homogeneous. See 
also region segmentation, region-based 
segmentation and recursive region 
growing. [Nev82:8.1.1] 


reference frame transformation: See 
coordinate system transformation. 
[WP:Rotating_reference_frame] 


reference image: An image of a known 
scene or of a scene at a particular 
time used for comparison with a cur- 
rent image. See, e.g., change detection. 
[SSBO6] 


reference plane: An arbitrary plane to 
which all other planes may be com- 
pared. [RCO2] 


reference views: In iconic recognition, 
the views chosen as most rep- 
resentative for a 3D object. See 
also eigenspace-based recognition and 
characteristic view. [WSOO] 


reference white: A sample image value 
which corresponds to a known white 
object. The knowledge of such a value 
facilitates white balance correction. 
[WP:White_point] 


reflectance: The ratio of reflected to 
incident flux; in other words, the ratio 
of reflected to incident (light) power. 
See also bidirectional reflectance 
distribution function. [JKS95:9.1.2] 


reflectance estimation: A class of tech- 
niques for estimating the bidirectional 
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reflectance distribution function 


(BDRF). Used notably within the 
techniques for shape from shading and 
image-based rendering, which seek to 
render arbitrary images of scenes from 
video material - all information about 
geometry and photometry (e.g., the 
BDREF) is derived from video. See also 
physics-based vision. [FP03:4.2.2] 


reflectance map: The reflectance map 
expresses the reflectance of a material 
in terms of a viewer-centered represen- 
tation of local surface orientation. The 
most commonly used is the Lambertian 
reflectance map, based on Lambert’s 


law. See also shape from shading, 


photometric stereo. [JKS95:9.3] 


reflectance model: A model that 
represents how light is reflected 
by a material. See also Lambertian 


surface, Oren-Nayar model and Phong 


reflectance model. [Sze10:2.2.2] 


reflectance ratio: 
invariant used for segmentation and 
recognition. It is based on the obser- 
vation that the illumination on both 
sides of a reflectance or color edge 
is nearly the same. So, although we 
cannot factor out the reflectance and 
illumination from only the observed 
lightness, the ratio of the lightnesses 
on either side of the edge equals the 
ratio of the reflectances, independent 
of illumination. The ratio is thus 
invariant to illumination and local 
surface geometry for a significant 
class of reflectance maps. See also 
invariant and physics-based vision. 
[NB93] 


reflection: 1) A mathematical transfor- 
mation where the output image is the 
input image flipped about a given trans- 
formation line in the image plane. See 
reflection operator. 
2) An optics phenomenon whereby all 
light incident on a surface is deflected 
away, without absorption, diffusion or 
scattering. An ideal mirror is the per- 
fect reflecting surface. Given a single 
ray of light incident on a reflecting 
surface, the angle of incidence equals 
the angle of reflection, as shown in 
the figure. See also specular reflection. 
[WP:Reflection_(physics)] 
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A photometric 


INCIDENT RAY REFLECTED RAY 


reflection operator: A linear transfor- 
mation that intuitively changes each 
vector or point of a given space to its 
mirror image: 


The transformation 


corresponding 
matrix, H, has the property HH = I, 
i.e., H`! = H: a reflection matrix is its 
own inverse. See also rotation. [SP89] 


refraction: An optical phenomenon 
whereby a ray of light is deflected 
while passing through different optical 
media, e.g., from air to water: 


INCIDENT RAY 


MEDIUM 1 


MEDIUM 2 


REFRACTED RAY 


The amount of deflection is governed 
by the difference between the refrac- 
tion indices of the two media, accord- 
ing to Snell’s law: 
nı Ny 
sin(a) sin(a2) 

where m and m are the refraction 
indices of the two media and a, and 


œz are the respective refraction angles. 
[Hec87:4.1] 


region: A connected part of an image, 


usually homogeneous with respect to 
a given criterion. [BB82:5.1] 


region adjacency graph (RAG): A 
graph expressing the adjacency rela- 
tions among image regions, e.g., gen- 
erated by a segmentation algorithm. 
See also region segmentation and 
region-based segmentation. In the fig- 
ure, the adjacency relations of the 
regions on the left are encoded in the 
RAG on the right: [JKS95:3.3.4] 


region-based active contours: An 


active contour model where (one of) 
the energy functions is based on local 
region energies. [CV01] 


region-based segmentation: A class 


of segmentation techniques produc- 
ing a number of image regions, 
typically on the basis of a given 
homogeneity criterion. For instance, 
intensity image regions can be homo- 
geneous by color (see color image 
segmentation) or texture properties 
(see texture field segmentation); range 
image regions can be homogeneous by 
shape or curvature properties (see HK 
segmentation). [JKS95:3.2] E 


region boundary extraction: The 


problem of computing the boundary of 
a region, e.g., the contour of a region 
in an intensity image after color image 
segmentation. [PY09] 


region decomposition: A class of 


algorithms aiming to partition an 
image or region thereof into regions. 


See also region-based segmentation. 
UKS95:3.2] 


region descriptor: 1) One or more 


properties of a region, such as 
compactness or moments. 

2) The data structure containing all 
data pertaining to a region. For 
instance, its position in the image 
(e.g., the coordinates of the center 
of mass), its contour (e.g., a list of 
2D coordinates), some indicator of its 
shape (e.g., compactness or perimeter 
squared over area) and the value of its 
homogeneity index. [NA05:7.3] 


region detection: A vast class of 


algorithms seeking to partition an 
image into regions with particular 
properties. See region identification, 
region labeling, region matching 
and region-based segmentation. 
[SOS00:4.3.2] 


region filling: A class of algorithms 


assigning a given value to all the 
pixels in the interior of a closed 
object contour identifying a region. 
For instance, one may want to 
fill the interior of a closed con- 
tour in a binary image with zeros 
or ones. See also morphology, 
mathematical morphology operation 
and binary mathematical morphology. 
[SOS00:4.3.2] 


region growing: A class of algorithms 


that construct a connected region by 
incrementally expanding the region, 
usually at the boundary. New data con- 
sistent with the region are merged 
into the region. The region is often 
redescribed after each new set of data 
is added to it. Many region-growing 
algorithms have the following form: 

1. Describe the region based on its cur- 
rent pixels (e.g., fit a linear model to 
the intensity distribution). 

2. Find all pixels adjacent to the cur- 
rent region. 

3. Add an adjacent pixel to the 
region if the region description also 
describes this pixel (e.g., it has a 
similar intensity). 

4. Repeat from Step 1 as long as new 
pixels continue to be added. 

A similar algorithm exists for region 

growing with 3D points, giving a 
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surface fitting. The data points could 
come from a regular grid (pixel or 
voxel) or from an unstructured list. In 
the latter case, it is harder to determine 
adjacency. [JKS95:3.5] 


region identification: A class of algo- 


rithms seeking to identify regions with 
special properties, e.g., human figures 
in a surveillance video or road vehi- 
cles in an aerial sequence. Region 
identification covers a very wide area 
of techniques spanning many appli- 
cations, including remote sensing, 
visual surveillance, surveillance, and 
agricultural and forestry surveying. 
See also target recognition, automatic 
target recognition (ATR), binary object 
recognition, object recognition and 
pattern recognition. [OM98] 


region invariant: 1) A property of 


a region that does not change after 
some transformation is applied to the 
region, such as translation, rotation or 
perspective projection. 

2) A property or function which is 
invariant over a region. [SH95] 


region labeling: A class of algorithms 


that are used to assign a label or mean- 
ing to each image region in a given 
image segmentation to achieve an 
appropriate image interpretation. Rep- 
resentative techniques are relaxation 
labeling, probabilistic relaxation 
labeling, and interpretation trees (see 
interpretation tree search). See also 
labeling problem. [TAO2] 


region of interest: A subregion of an 


image where processing is to occur. 
Regions of interest may be used to 
reduce the amount of computation 
that is required or to focus processing 
so that image data outside the region 
do not distract from or distort results. 
As an example, when tracking a tar- 
get through an image sequence, most 
algorithms for locating the target in the 
next video frame only consider image 
data from a region of interest surround- 
ing the predicted target position. The 
figure shows a boxed region of inter- 
est: [WP:Region_of_interest] 


region of support: The subregion of an 


image that is used in a particular com- 
putation. For example, edge detection 
usually only uses a subregion of pixels 
neighboring the pixel under consider- 
ation for being an edge. [KMJ05:5.4.1- 
5.4.2] 


region propagation: The problem of 
region matching: 1) Establishing the tracking moving image regions. [RC99] 
correspondences between matching 
members of two sets of regions. 

2) Determining the degree of simi- 
larity between two regions, i.e., solv- 
ing the feature matching problem for 
regions. See, e.g., template matching, 
color matching and color histogram 
matching. [LWW00] 


region representation: A class of meth- 
ods to represent the defining character- 
istics of an image region. For encoding 
the shapes, see axial representation, 
convex hull, graph model, quadtree, 
run-length coding and skeletonization. 
For encoding a region by its properties, 
see moments, curvature scale space, 
Fourier shape descriptor, wavelet 
descriptor and shape representation. 


region merging: A class of algorithms 
that fuse two image regions into one if 


a given homogeneity criterion is sat- JKS95:3.3] 
ase Sec alse ee region-based region segmentation: See region-based 
segmentation and region splitting. segmentation r 


[Sch89:Ch. 6] 


region neighborhood graph: See 
region adjacency graph. 


region snake: A snake representing the 
boundary of some region. The opera- 
tion of computing the snake may be 
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used as a region segmentation tech- 
nique. [CRB99] 


region splitting: A class of algorithms 
dividing an image, or a region thereof, 
into parts (subregions) if a given 
homogeneity criterion is not satis- 
fied over the region. See also region, 
region-based segmentation and region 
merging. [Sch89:Ch. 6] 


regional activity: An activity that takes 
place in a localized region (as opposed 
to a global activity). [LXG09] 


registration: A class of techniques 
aiming to align, superimpose, or 
match two objects of the same 
kind (e.g., images, curves or mod- 
els); more specifically, they compute 
a geometric transformation superim- 
posing one object on the other. 
For instance, image registration deter- 
mines the region common to two 
images, thereby finding the planar 
transformation (rotation and transla- 
tion) that brings them into alignment; 
similarly, curve registration determines 
the transformation aligning a similar 
(or the same) part of two curves. The 
figure shows the registration (on the 
right) of the solid (left) and dashed 
(middle) curves: 


The transformation need not be rigid; 
non-rigid registration is common 
in medical imaging, e.g., in digital 
subtraction angiography. Notice 
also that most often there is no 
exact solution, as the two objects 
are not exactly the same, and the 
best approximate solution must 
be found by least squares or more 
complex methods. See also Euclidean 
transformation, medical image 
registration, model registration and 
multi-image registration. [FP03:21.3] 


regression: A regression problem 
is a supervised learning problem 
where the response variable is one 
or more continuous variables. In 
linear regression, a linear relationship 


between the input and response vari- 
ables is assumed. Nonlinear models 
can also be considered, e.g., support 
vector regression and Gaussian 
process regression. See also curve 
fitting and surface fitting. [BisO6:p. 3] 


regression testing: Regression testing 


verifies that changes to the implemen- 
tation of a system have not caused a 
loss of functionality, or regression to 
the state where that functionality did 
not exist. [WP:Regression_testing] 


regularization: A class of mathemati- 


cal techniques to solve an ill-posed 
problem. In essence, to determine a 
single solution, one introduces the 
constraint that the solution must be 
smooth, in the intuitive sense that sim- 
ilar inputs must correspond to similar 
outputs. The problem is then cast as a 
variational problem, in which the vari- 
ational integral depends both on the 
data and on the smoothness constraint. 
For instance, a regularization approach 
to the problem of estimating a function 
f from a set of values y1, y2,..., Yn at 
the data point X1,..., Xn, leads to the 
minimization of the functional 


N 
Hf) = SFD- yw +O) 


i=1 


where ®(/f) is the smoothness func- 
tional and i is a positive parame- 
ter called the regularization number. 
(JKS95:13.7] 


relational graph: A graph in which 


the arcs express relations between 
the properties of image entities (e.g., 
regions or other features) that are the 
nodes in the graph. For regions, com- 
monly used properties are adjacency, 
inclusion, connectedness and relative 
area size. See also region adjacency 
graph (RAG) and shape representation. 
In the figure, the adjacency relations 
of the regions (left) are encoded in the 
RAG (right): [DH73:12.2.2] 
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relational matching: A class of match- 


ing algorithms based on relational 
descriptors. See also relational graph. 
[BB82:11.2] 


a discrete set to each node of a 
network or graph. A classic exam- 
ple in artificial intelligence is the 
Waltz line labeling algorithm (see also 
line-drawing analysis). [JKS95:14.3] 


relational model: See relational graph. 


relational shape description: A class relaxation matching: A relaxation 


of shape representation techniques 
based on relations between the prop- 
erties of image entities (e.g., regions 
or other features). For regions com- 
monly used properties are adjacency, 
inclusion, connectedness and relative 
area size. See also relational graph and 
region adjacency graph. [Sha80] 


relative depth: The difference in depth 


(distance from some observer) values 
for two points. In certain situations, it 
may not be possible to compute actual 
or absolute depth but it may be possi- 
ble to compute relative depth. [Pra80] 


relative motion: The motion of an 


object with respect to some other, pos- 
sibly also moving, frame of reference 
(typically the observer’s). [Reg86] 


relative orientation: The problem of 


computing the orientation of an object 
with respect to another coordinate sys- 
tem, such as that of the sensor. More 
specifically, the rotation matrix align- 
ing the reference frames attached to 
the object and second object. See also 
pose and pose estimation. [JKS95:12.4] 


relaxation: A technique for assigning 


values from a continuous or discrete 
set to the nodes of a network or graph 
by propagating the effects of local 
constraints. The network can be an 
image grid, in which case the pixels 
are nodes, or features such as edges 
or regions. At each iteration, each 
node interacts with its neighbors, alter- 
ing its value according to the local 
constraints. As the number of itera- 
tions increases, the effect of local con- 
straints are propagated to more parts of 
the network. Convergence is achieved 
when no more changes occur or 
changes become insignificant. See also 
discrete relaxation, relaxation labeling 
and probabilistic relaxation labeling. 
[SQ04:6.1] 


relaxation labeling: A relaxation tech- 


nique for assigning a label from 
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labeling technique for model match- 
ing, the purpose of which is to label 
(match) each model primitive with 
a scene primitive. Starting from an 
initial labeling, the algorithm itera- 
tively harmonizes neighboring labels 
using a coherence measure for the 
set of matches. See also discrete 
relaxation, relaxation labeling and 
probabilistic relaxation labeling. 
[LS88] 


relaxation segmentation: A class of 


segmentation techniques based on 
relaxation. See also image segmen- 
tation. [BT88:Ch. 5] 


relevance feedback: Feedback (e.g., 


from a user) on whether results 
returned (e.g., from a search or image 
retrieval) are relevant. This feedback 
information can be used to improve 
further searches. [MRSO8:Ch. 9] 


relevance learning: Some meth- 


ods such as the k-nearest-neighbor 
algorithm depend on measuring 
the similarity of instances in terms 
of a distance metric in the input 
feature space. In relevance learning, 
this metric may be varied (e.g., by 
re-weighting different dimensions) to 
improve performance. [XNJRO3] 


relevance vector machine (RVM): 


A Bayesian sparse kernel method 
for regression and classification 
problems. The prediction function 
f@ = Yo_, wkx, x) is represented 
by a linear combination of kernel 
functions, where i = 1,..., n indexes 
the data points. As in support vector 
regression (and in contrast to Gaussian 
process regression and kernel ridge 
regression), the RVM solution for the 
w coefficients is sparse, i.e., many of 
them are zero. [Bis06:7.2] 


relighting: A technique for altering an 


image so that it appears to have been 
taken under different lighting condi- 
tions. [WGT+05] 


remote sensing: The acquisition, 
analysis and understanding of imagery, 
mainly of the earth’s surface, acquired 
by airplanes or satellites. Used 
frequently in agriculture, forestry, 
meteorological and military applica- 
tions. See also multi-spectral analysis, 
multi-spectral image and geographic 
information system. [Sch89:Ch. 6] 


representation: A description or model 
specifying the properties defining an 
object or class of objects. A classic 
example is shape representation - a 
group of techniques for describing 
the geometric shape of 2D and 3D 
objects. See also Koenderink’s surface 
shape classification. Representations 
can be symbolic or non-symbolic 
(see symbolic object representation 
and mnon-symbolic representation), 
a distinction inherited from artifi- 
cial intelligence. [WP:Representation_ 
(mathematics)] 


resection: The computation of the posi- 
tion of a camera given the images of 
some known 3D points. Also known as 
camera calibration or pose estimation. 
[HZ00:21.1] 


residual: In a regression problem, 
the difference between the observed 
response and the corresponding pre- 
diction. [Mur12:16.3] 


resolution: The number of pixels per 
unit area, length, visual angle etc. 
[Low91:p. 236] 


restoration: Given a noisy sample of 
some true data, the goal of restoration 
is to recover the best possible estimate 
of the original true data. [TV98:3.1.1] 


reticle: The network of fine wires or 
receptors placed in the focal plane of 
an optical instrument for measuring 
the size or position of the objects under 
observation. [WP:Reticle] 


retina: The photosensitive surface at the 
back of the eye. The retina is a highly 
complex manifold structure that turns 
incident light into nerve impulses that 
are carried to the visual pathways of 
the brain. [FP03:1.3] 


retinal image: The image which is 


formed on the retina of the eye. 
[Nal93:1.2.2] 


retinex: An image enhancement algo- 
rithm based on retinex theory, aiming 
to compute an illuminant-independent 
quantity called lightness at each image 
pixel. The key observation is that nor- 
mal illumination on a surface changes 
slowly, leading to slow changes in 
the observed brightness of a surface. 
This contrasts with strong changes 
in brightness at reflectance and fold 
edges. The retinex algorithm removes 
the slowly varying components by 
exploiting the fact that the observed 
brightness B = L x J is the product of 
the surface lightness (or reflectance) 
L and the illumination J. By taking 
the logarithm of B at each pixel, 
the product of Z and J become a 
sum of logarithms. Slow changes can 
be detected by differentiation and 
then removed by thresholding. Re- 
integration of the result produces the 
lightness image (up to an arbitrary 
scale factor). [Hor86:9.3] 


retro-illumination: The reflection of 
unfocused light off a surface in order 
to observe the back lighting of a dif- 
ferent structure (e.g., the lens in the 
human eye). [KO80] 


retroreflection: The reflection of 
light back towards its source with 
the minimum possible scattering. 
[WP:Retroreflector] 


reverse engineering: In the field of 
computer vision, the problem of gen- 
erating a model of a 3D object from 
a set of views, e.g., a VRML or a 
triangulated model. The model can 
be purely geometric (describing just 
the object’s shape) or can combine 
shape and textural properties. Tech- 
niques exist for reverse engineering 
from both range images and intensity 
images. See also geometric model and 
model acquisition. [TV98:4.6] 


RGB: A format for color images, encoding 
the red, green, and blue components 
of each pixel in separate channels. 
See also YUV and color image. [FP03: 
6.3.1] 


ribbon: A shape representation for pipe- 
like planar objects whose contours are 
approximately parallel, e.g., roads in 
aerial imagery. See also generalized 
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cones and shape 
[FP03:24.2.2-24.2.3] 


Ricci flow: A type of nonlinear diffusion 
equation on a Riemannian manifold 
used in shape analysis. [WSG10] 


representation. 


ridge: A particular type of discontinuity 
of the intensity function, giving rise to 
thick edges and lines. The figure shows 
a characteristic dark-to-light-to-dark 
intensity ridge profile along a scan line: 


INTENSITY 


SSE) 


2 


PIXEL POSITION 


See also step edge, roof edge and edge 
detection. [WP:Ridge_detection] 


ridge detection: A class of algo- 
rithms, especially edge and line detec- 
tors, for detecting ridges in images. 
[WP:Ridge_detection] 


ridge regression: A method for the 
regularization of linear regression and 
related methods, where a penalty term 
is added. The penalty term consists of 
the sum of squares of the regression 
coefficients, thus penalizing large coef- 
ficients. [HTF08:3.4] 


Riemannian manifold: A differentiable 
manifold that is endowed with a metric 
tensor which may vary smoothly from 
point to point. [Lov10:5.1] 


right-handed coordinate system: A 
3D coordinate system with the XYZ 
axes arranged as shown: 


+Z 
(OUT OF PAGE) 
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The alternative is a left-handed 
coordinate system. [FP03:2.1.1] 


rigid body segmentation: The problem 
of automatic partitioning of the image 
of an articulated object or deformable 
model into a number of rigid subcom- 
ponents. See also part segmentation 
and recognition by components (RBC). 
[WM97] 


rigid motion estimation: A class of 
techniques aiming to estimate the 3D 
motion of a rigid body or scene in 
space from a sequence of images by 
assuming that there are no changes 
in shape. Rigidity simplifies the prob- 
lem significantly so that changes in 
appearance arise solely from changes 
in relative position and projection. 
Techniques exist for using known 3D 
models; for estimating the motion of 
a general cloud of 3D points or from 
image feature points; and for estimat- 
ing motion from optical flow. See 
also motion estimation and egomotion. 
[Hor86:17.2] 


rigid registration: Registration where 
neither the model nor the data is 
allowed to deform. This reduces reg- 
istration to estimating the Euclidean 
transformation that brings the model 
into alignment with the data. See also 
non-rigid registration. [RPMAO1] 


rigidity constraint: The assumption 
that a scene or object under analysis 
is rigid, implying that all 3D points 
remain in the same relative positions 
in space. This constraint can signifi- 
cantly simplify many algorithms, e.g., 
shape reconstruction (see shape and 
the following “shape from” entries) 
and motion estimation. [JKS95:14.7] 


ring artifact: An artifact in which rings 
(circular patterns) appear in computed 
axial tomography (CT) images. [Rav98] 


road structure analysis: A class of 
techniques used to derive information 
about roads from images. These can be 
close-up images (e.g., images of the tar- 
mac as acquired from a moving vehi- 
cle, used to map defects automatically 
over extended distances) or remotely 
sensed images (e.g., to analyze the geo- 
graphical structure of road networks). 
[SCW95] 


Roberts cross gradient operator: An 


operator used for edge detection, 
computing an estimate of perpendicu- 
lar components of the image gradient 
at each pixel. The image is convolved 
with the two Roberts kernels, yielding 
two components, Gy and G,, for 
each pixel. The gradient magnitude 

/G} + G}, and orientation arctan a 
can then be estimated as for any 2D 
vector. See also edge detection, entries 
for specific edge detectors (Canny, 
Deriche, Hueckel, Kirsch compass, 
Marr-Hildreth, O’Gorman and 
Robinson), Sobel gradient operator 
and Sobel kernel. [JKS95:5.2.1] 


Roberts kernel: A pair of kernels, or 


masks, used to estimate perpendicular 
components of the image gradient 
within the Roberts cross gradient 
operator: 


The masks respond maximally to edges 
oriented to +45 ° from the vertical axis 
of the image. [JKS95:5.2.1] 


Robinson edge detector: An operator 


for edge detection, computing an esti- 
mate of the directional first derivatives 
of the image in eight directions. The 
image is convolved with the eight ker- 
nels, three of which are shown here: 
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Two of these, typically those respond- 
ing maximally to differences along 
the coordinate axes, can be taken as 
estimates of the two components of 
the gradient, G, and G,. The gradient 


magnitude |/G2. + G? and orientation 
Gy : 
arctan a can then be estimated as for 


any 2D vector. See also edge detection, 
Roberts cross gradient operator, Sobel 


gradient operator, Sobel kernel 


and entries for specific edge detec- 
tors (Canny, Deriche, Hueckel, 
Kirsch compass, Marr-Hildreth and 


O’Gorman). [Umb98:2.3.5] 


robot behavior: The actions of a robotic 


device. [HV99] 


robot vision: Any automated vision sys- 


tem used to provide a robotic device 
with visual information or feedback. 
Robot vision is somewhat different 
from traditional computer vision in 
that it can be part of a closed-loop sys- 
tem where the vision system guides the 
robot motion which in turn is observed 
by the vision system allowing the robot 
motion to be adapted or corrected. 
[HS92] [Hor86] 


robust: A general term referring to a 


technique which is insensitive to noise 
or other perturbations. [FP03:15.5] 


robust estimator: A statistical estima- 


tor that, unlike normal least square 
estimation, is not distracted by even a 
significant percentage of outliers in the 
data. Popular robust estimators include 
RANSAC, least median of squares and 
M-estimation. See also outlier rejection. 
[FP03:15.5] 


robust regression: A form of regression 


that does not use outlier values in 
computing the fitting parameters. For 
example, normal regression methods 
use all data points to carry out a least 
squares straight line fit; this can give 
distorted results if even one point is far 
away from the “true” line. Robust pro- 
cesses either eliminate these outlying 
points or reduce their contribution to 
the results. The figure shows a rejected 
outlying point: [JKS95:6.8.3] 


REJECTED OUTLIER 


INLIERS 
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robust statistics: A general term describ- 
ing statistical methods that are not 
significantly influenced by outliers. 
[WP:Robust_statistics] 


robust technique: See robust estimator. 


ROC: See receiver operating curve and 
performance analysis for vision. 


Rodrigues rotation formula: An effi- 
cient algorithm for rotating a vector by 
some angle in 3D space around a spec- 
ified axis. [Sze10:p. 42] 


rod (in eye): Photoreceptor cells in the 
human eye which are very sensitive 
to all wavelengths of visible light (i.e., 
they sense the luminance level). Rods 
are complemented by cones within the 
retina. [FP03:1.3] 


roll: A 3D rotation representation com- 
ponent (along with pitch and yaw) 
often used for cameras or moving 
observers. The roll component speci- 
fies a rotation about the optical axis or 
line of sight. The figure shows the roll 
rotation direction: [JKS95:12.2.1] 
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rolling shutter camera: A camera in 
which the time at which a pixel col- 
lects photons varies from pixel to 
pixel. This may occur for mechanical 
or electronic reasons. It allows contin- 
ued acquisition of photons from most 
other pixels while a given pixel is 
being sampled. This contrasts with a 
camera where all pixels capture pho- 
tons and are read out simultaneously. 
[WP:Rolling shutter] 


roof edge: 1) An image edge where 
the values increase continuously to 
a maximum and then decrease con- 
tinuously, such as the brightness val- 
ues on a Lambertian surface cylinder 
when lit by a point light source or 
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the orientation discontinuity (the fold 
edge) in a range image. 

2) A scene edge where an orientation 
discontinuity occurs. The figure shows 
a horizontal roof edge in a range image: 
[JKS95:Ch. 5] 


+«— ROOF EDGE 


rotating mask: A mask which is consid- 
ered in a number of orientations rela- 
tive to some pixel. See, e.g., the masks 
used in the Robinson edge detector. 
Most commonly used as a type of 
average smoothing: the most homoge- 


neous mask is used to compute the 
smoothed value for every pixel. In the 
figure, the major boundaries have not 
been smoothed although image detail 
has been reduced: [DVD+93] 


rotation: A circular motion of a set of 
points or an object around a given 
point (2D) or line (3D, called the axis 


of rotation). [JKS95:12.2.1-12.2.2] 


rotation estimation: The estimation 
rotation from raw or processed image, 


video or range data, typically from 

two sets of corresponding points (or 

lines or planes) taken from rotated ver- 
sions of a pattern. The problem usually 
appears in one of three forms: 

e estimating the 3D rotation from 3D 
data (three points are needed); 

e estimating the 3D rotation from 2D 
data (three points are needed but 
lead to multiple solutions); 

e estimating the 2D rotation from 2D 
data (two points are needed). 

A second issue to consider is the effect 

of noise: typically more than the min- 

imum number of points are needed 
to counteract the effects of noise, 
which leads to least square algorithms. 

[OK98] 


rotation invariant: A property that 


keeps the same value even if any of the 
data values, the camera, the image or 
the scene from which the data comes 
is rotated. One needs to distinguish 
between 2D (i.e., in the image) and 3D 
(i.e., in the scene) rotation invariance. 
For example, the angle between two 
image lines is invariant to image rota- 
tion, but not to rotation of the lines in 
the scene. [WP:Rotational_invariance] 


rotation matrix: A linear operator rotat- 


ing a vector in a given space. The 
inverse of a rotation matrix equals 
its transpose. A rotation matrix has 
only three degrees of freedom in 
3D and one in 2D. In 3D space, 
there are three eigenvalues: 1, cos + 
isin, cos@ — i sinð, where 7 is the 
imaginary unit. A rotation matrix in 
3D has nine entries but only three 
degrees of freedom, as it must sat- 
isfy six orthogonality constraints. It 
can be parameterized in various ways, 
usually through Euler angles, yaw- 
pitch-roll, rotation angles around the 
coordinate axes and axis-angle curve 
representations. See also orientation 
estimation, rotation representation and 
quaternions. [FP03:2.1.2] 


rotation operator: A linear opera- 


tor expressed by a rotation matrix. 
[JKS95:12.2.1] 


rotation representation: A formalism 


describing rotations and their alge- 
bra. The most frequent is definitely 


the rotation matrix, but quaternions, 
Euler angles, yaw-pitch-roll, rotation 
angles around the coordinate axes and 
axis-angle curve representations have 
also been used. [Hor86:18.10] 


rotational symmetry: The property of 


a set of points or an object to remain 
unchanged after a given rotation. For 
instance, a cube has several rota- 
tional symmetries, with respect to any 
90° rotation around any axis passing 
through the centers of opposite faces. 
See also rotation and rotation matrix. 
[WP:Rotational_symmetry] 


rotoscoping: A technique for creat- 


ing an animation from a live action 
sequence in which features (such as 
an outline of body position) are copied 
frame by frame from the live action 
sequence: [WP:Rotoscoping] 


RS-170: The standard black-and-white 


video format in the United States. 
The Electronic Industry Association 
(EIA) is the standards body that orig- 
inally defined the 525-line, 30-frame- 
per-second, TV standard for North 
America, Japan and a few other parts 
of the world. The EIA standard, also 
defined under US standard RS-170A, 
defines only the monochrome pic- 
ture component but is mainly used 
with the NTSC color encoding stan- 
dard. A version exists for PAL cameras. 
[Gal90:4.1.3] 


rubber sheet model: See mem- 


brane model. [WP:Gravitational_ 
well#The_rubber-sheet_model] 


rule-based classification: A method 


of object recognition, drawn from 
artificial intelligence, in which log- 
ical rules are used to infer object 
type. [WP:Concept_learning#Rule- 
based_theories_of_concept_learning] 


run code: See run-length coding. 
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run-length 


coding: A lossless 
compression technique used to 
reduce the size of a repeating string of 
characters, called a “run”, also applica- 
ble to images. The algorithm encodes 
a run of symbols into two bytes, a 
count and a symbol. For instance, 
the six-byte string “xxxxxx” would 
become “6x” and would occupy only 
two bytes. It can compress any type of 
information content, but the content 
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run-length 


itself affects, obviously, the compres- 
sion ratio. Compression ratios are not 
high compared to other methods, but 
the algorithm is easy to implement and 
quick to execute. Run-length coding is 
supported by bitmap file formats such 
as TIFF, BMP and PCX. See also image 
compression, video compression and 
JPEG. [Jai89:11.2, 11.9] 


compression: See 
run-length coding. 


saccade: A movement of the eye or cam- 
era, changing the direction of fixation 
sharply. [WP:Saccade] 


salience: The extent to which something 
(e.g., a visual feature) stands out rela- 
tive to other nearby features (from the 
Latin salire meaning to leap). [Itt01] 


saliency map: A representation 
encoding the saliency of given 
image elements, typically fea- 
tures or groups thereof. See also 
salient feature, Gestalt, perceptual 
grouping and perceptual organization. 
[WP:Salience_(neuroscience)] 


salient behavior: Behavior of a person 
or system which is distinct from nor- 
mal behavior. 


salient feature: A feature associated 
with a high value of a saliency measure, 
quantifying feature suggestiveness for 
perception. For instance, inflection 
points have been indicated as salient 
features for representing contours. 
Saliency is a concept that originated 
from Gestalt psychology. See also 
perceptual grouping and perceptual 
organization. [KK98] 


salient pixel group: A group of pixels 
that exhibits a distinct pattern relative 
to neighboring pixels. [XG06] 


salient point: Typically, a feature which 
is distinct relative to those around it. 
[SLO3] 


salient regions: Image regions that 
are interesting relative to their local 
image context. They should be sta- 
ble to global transformations (includ- 
ing scale, illumination and perspective 
distortions) and image noise. They 
can be used for object representation, 
correspondence matching, tracking 
etc. [KZB04] 


salt-and-pepper noise: A type of impul- 
sive noise. Let x,y € [0,1] be two 
uniform random variables, J the true 
image value at a given pixel and J, 
the corrupted (noisy) version of I. We 
can define the effect of salt-and-pepper 
noise as In = imin + Vimax — tin) iff 
x > 1, where / is a parameter control- 
ling how much of the image is cor- 
rupted and timin, dna, are the range of 
the noise. See also image noise and 
Gaussian noise. The figure was cor- 
rupted with 1% noise: [TV98:3.1.2] 


sample covariance: For a d-dimensional 
data set represented as a set of n 
column vectors x; for i=1,...,n 
with sample mean m, the sam- 
ple covariance is the d xd matrix 
S= 1°), G — MG; — m)". See also 
covariance matrix. [MKB79:1.4.1] 

sample mean: For a d-dimensional data 
set represented as a set of n column 
vectors X fori = 1,..., n, the sample 
mean is m= 1 J`; X. See also mean. 
[MKB79:1.4.1] 


sampling: The transformation of a con- 
tinuous signal into a discrete one by 
recording its values at discrete instants 
or locations. Most digital images are 
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sampled in space, time and intensity, 
as intensity values are defined only 
on a regular spatial grid and can only 
take integer values. The figure shows 
a continuous signal and its samples: 
[FP03:7.4.1] 


sampling bias: If samples are collected 
from a random variable according to 
the true distribution then any statis- 
tic computed from the sample should 
not deviate systematically from the 
population expectation. If the sample 
does not represent the true distribu- 
tion there is said to be “sampling bias”. 
[WP:Bias_(statistics)] 


sampling density: The density of a sam- 
pling grid, that is, the number of sam- 
ples collected per unit interval. See also 
sampling. [BB82:2.2.6] 


sampling theorem: If an image is sam- 
pled at a rate higher than its Nyquist 
frequency then an analog image could 
be reconstructed from the sampled 
image whose mean square error with 
the original image converges to zero as 
the number of samples goes to infinity. 
[ Jai89:4.2] 


Sampson approximation: An approx- 
imation to the geometric distance in 
the fitting of implicit curves or implicit 
surfaces that are defined by a parame- 
terized function of the form f(@; xX) = 
0 for X on the surface S(@) defined by 
parameter vector 4. Fitting the surface 
to the set of points {X,, . . . , Xn} consists 
in minimizing a function of the form 
e(a) = } i dA, S@). Simple solu- 
tions are often available if the distance 
function d(x, S(@) is the algebraic 
distance d(X, S@) = fa, X. Under 
certain common assumptions, the 
optimal solution arises when d is 
the more complicated geometric 
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distance d(x, S(@)) = minjes ||x — PII. 
The Sampson approximation defines 


2. 2 
AG, S@) = LE 
IV f; l? 

which is a first-order approximation to 
the geometric distance. If an efficient 
algorithm for minimizing weighted 
algebraic distance is available, then 
the Sampson iterations are a further 
approximation, where the kth iterate 
åp is the solution to 


n 
ae = argmin X` wi fax 


4 i=1 


with weights computed using 
the previous estimate so w; =1/ 
IV f G13 XDI. [HZ00:3.2.6, 11.4] 


SAR: see synthetic aperture radar. 
[WP:Synthetic_aperture_radar] 


SAT: See symmetric axis transform. 


satellite image: An image of a sec- 
tion of the earth acquired using a 
camera mounted on an orbiting satel- 
lite. [WP:Satellite_imagery] 


saturation: Reaching the upper limit 
of a dynamic range. For instance, 
intensity saturation occurs for an 8- 
bit monochromatic image when inten- 
sities greater than 255 are recorded: 
any such value is encoded as 255, 
the largest possible value in the range. 
[WP:Colorfulness] 


Savitzky—Golay filtering: A class of 
filters achieving least squares fitting of 
a polynomial to a moving window of a 
signal. Used for fitting and data smooth- 
ing. See also linear filter and curve 
fitting. [WP:SavitzkyGolay filter for 
smoothing and differentiation] 


scalability: A general property of com- 
puter algorithms that means that the 
performance of the algorithm does 
not degrade significantly as the num- 
ber of inputs increases. For example, 
an image-processing algorithm is scal- 
able if its computation time remains 
nearly constant or grows only with the 
image size rather than as the square 
of the image size. It may also refer to 
actions on image databases where one 


would like nearly constant speeds irre- 
spective of the size of the database. 
[WP:Scalability] 


scalar: A one-dimensional entity; a real 
number. [WP:Scalar_C(mathematics)] 


scale: 1) The ratio between the size of an 


object, image or feature and that of a 
reference or model. 

2) The property that some image fea- 
tures are apparent only when viewed 
at a given size, such as a line being 
enlarged so much that it appears as a 
pair of parallel edge features. 

3) A measure of the degree to which 
fine features have been removed from 
or reduced in an image. One can ana- 
lyze images at multiple spatial scales, 
whereby only features in certain size 
ranges appear at each scale (see scale 
space and pyramid). [Nal93:3.1.2] 


scale invariant: A property that keeps 
the same value even if the data, the 
image or the scene from which the data 
comes is shrunk or enlarged. The ratio 
perimeter? is invariant to image scaling. 
[WP:Scale_invariance] 


scale operator: An operator, e.g., 


Gaussian smoothing, that suppresses 
details (high-frequency contents) in 
an image. Details at small scales are 
discarded. The resulting content can 
be represented in a smaller image. 
See also scale space, image pyramid, 
Gaussian pyramid, Laplacian pyramid 
and pyramid transform. [OP97] 


scale reduction: The result of the appli- 


cation of a scale operator. [CH01] 


scale selection: 1) When making some 


measurement (e.g., edge strength) that 
varies as the image or smoothing scale 
varies, there may be scale settings that 
are significant, e.g., when the measure- 
ment achieves a local maxima or min- 
ima for some scale or when the mea- 
surement is stable over wide ranges 
of scale. The goal of scale selection is 
to identify these scales. Another well- 
known use is in the SIFT operator, 
where the feature points are selected 
based, in part, upon the local minima 
of the Laplacian. 

2) Selecting the size of operator that is 
tuned to a particular size of target. For 


example, an eye location needs to be 
tuned to the approximate likely size of 
eyes in the image. [Lin09] 


scale space: A theory for early vision 


developed to account properly for the 
multi-scale nature of images. The ratio- 
nale is that, in the absence of a priori 
information on the optimal spatial 
scale at which a specific problem 
should be treated (e.g., edge detec- 
tion), images should be analyzed at all 
possible scales, the coarser ones rep- 
resenting simplifications of the finer 
ones. The finest scale is the input image 
itself. See scale-space representation 
for details. [CSO9:Ch. 5] 


scale space filtering: The filtering oper- 


ation that transforms one resolution 
level into another in a scale space, e.g., 
Gaussian filtering. [Sch89:Ch. 7] 


scale space matching: A class of match- 


ing techniques that compare shape at 
various scales. See also scale space and 
image matching. [CS09:5.2.3] 


scale-space representation: A repre- 


sentation of an image and more gen- 
erally of a signal, making explicit 
the information contained at multi- 
ple spatial scales and establishing a 
causal relationship between adjacent 
scale levels. The scale level is identi- 
fied by a scalar parameter called the 
“scale parameter”. A crucial require- 
ment is that coarser levels, obtained 
by successive applications of a scale 
operator, should constitute simplifica- 
tions of previous (finer) levels, i.e., it 
should introduce no spurious details. 
A popular scale-space representation 
is the Gaussian scale space, in which 
the next coarser image is obtained 
by convolving the current image with 
a Gaussian kernel. The variance of 
this kernel is the scale parameter. 
See also image pyramid and Gaussian 
smoothing. [CS09:5.3] 


scaling: 1) The process of zooming or 


shrinking an image. 

2) Enlarging or shrinking a model to fit 
a set of data. 

3) The process of transforming a set of 
values so that they lie inside a standard 
range (e.g., [—1,1]), often to improve 
numerical stability. [Nal93:6.2.1] 
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scanning 


scaling factor: A numerical value com- 


monly used to resize a set of values. For 
example, one could divide a dataset by 
the difference between the largest and 
smallest values, resulting in values in 
[0,1]. It may be useful to also subtract 
the mean first. This operation is typi- 
cally done for two reasons: to ensure 
that all data from different properties 
have approximately the same magni- 
tude and to ensure that all values are 
not too large or too small. Both aspects 
tend to improve the numerical perfor- 
mance of an algorithm. [Bis06:p. 425] 


scanline: A single (horizontal) line of an 


image. Originally this term was used for 
cameras in which the image is acquired 
line by line by a sensing element that 
generally scans each pixel on a line 
and then moves onto the next line. 
[WP:Scan_line] 


scanline slice: The cross section of a 


structure along an image scanline. The 
figure shows the scanline slice of a con- 
vex polygon in a binary image: [SM97] 


scanline stereo matching: The stereo 


matching problem with rectified 
images, whereby corresponding points 
lie on scanlines with the same 
index. See also rectification and stereo 
correspondence problem. [OK85] 


electron microscope 
(SEM): A scientific microscope intro- 
duced in 1942. It uses a beam of 
highly energetic electrons to examine 
objects on a very fine scale. The 
imaging process is essentially the 
same as for a light microscope apart 
from the type of radiation used. 
Magnification is much higher than 
what can be achieved with light. The 
images are rendered in gray shades. 
This technique is particularly useful 
for investigating microscopic details 
of surfaces. [Hor86:11.1.3] 
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scatter matrix: For a set of d- 


dimensional points represented as col- 
umn vectors {X,,...,X,}, with mean 
L = +77, X, the scatter matrix is the 


d x d matrix 
s=) & -D& - wD" 
i=1 


It is n times the sample covariance 
matrix. [DH73:4.10] 


scattergram: See scatterplot. 


scatterplot: A data display technique in 


which each data item is plotted as a sin- 
gle point in an appropriate coordinate 
system that might help a person to bet- 
ter understand the data. For example, 
if a set of estimated surface normals is 
plotted in a 3D scatterplot, then planar 
surfaces should produce tight clusters 
of points. The figure shows a set of data 
points plotted according to their values 
of features 1 and 2: [DH73:1.2] 
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scene: The part of 3D space captured 


by an imaging sensor, and every visible 
object therein. [Sch89:Ch. 1] 


scene analysis: The process of exam- 


ining an image or video, for the pur- 
pose of inferring information about the 
scene, such as the shape of the vis- 
ible surfaces, the identity of objects 
and their spatial or dynamic relation- 
ships. See also shape from contour 
and the following “shape from” entries, 
object recognition and symbolic object 
representation. [Sch89:6,7] 


scene classification: Deciding the 


genre of a particular image or frame 
from a video. For example, a system 
might categorize images as being cap- 
tured from one of {office scene, out- 
door urban scene, domestic scene, out- 
door natural scene}. [BZM06] 


scene modeling: 


scene constraint: Any constraint 


imposed on the image data by the 
nature of the scene, e.g., rigid motion, 
the orthogonality of walls and floors 
etc. [HZ00:9.4.1-9.4.2] 


scene coordinates: A 3D coordinate 
system that describes the position 
of scene objects relative to a given 
coordinate system origin. Alterna- 
tive coordinate systems are camera 


reconstruction. [WP:Computer_vision 
#Scene_reconstruction] 


scene understanding: The problem of 


constructing a semantic interpretation 
of a scene from image data, that is, how 
to describe the scene in terms of object 
identities and relationships among 
objects. See also image interpretation, 
object recognition, symbolic object 
representation, semantic net, graph 


coordinates, viewer-centered coordi- 
nates and object-centered coordinates. 
[JKS95:1.4.2] 


scene decomposition: Segmentation of 


an image into semantically meaningful 
regions. For example, an interior office 
image might be segmented into regions 
such as {desk, chair, table, cabinet}. 
See also semantic image segmentation. 
[GFK09] 


scene labeling: The problem of iden- 


tifying scene elements from image 
data and associating them with 
labels representing their nature and 
roles. See also labeling problem, 
region labeling, relaxation labeling, 
image interpretation and scene 
understanding. [BB82:12.4] 


scene layout: The position of the main 


elements in a scene. This knowl- 
edge can be used to help with scene 
decomposition or scene labeling. For 
example, an outdoor scene layout 
could consist of sky above, then green 
fields, then a gray road, with a vehicle 
on the road. 


Constructing a 
geometric model, graph model or 
other type of model that describes the 
contents and positioning of structures 
in a scene. [vdHDT+07] 


scene recognition: See scene 
classification. 


scene reconstruction: The problem of 


estimating the 3D geometry of a scene, 
e.g., the shape of visible surfaces or 
contours, from image data. See also 
reconstruction, shape from contour 
and the following “shape from” entries, 
architectural model reconstruction, 
volumetric reconstruction, surface 
reconstruction and slice-based 


model and relational graph. [LSF09] 


scene vector: A representation used 


in video analysis to describe what 
is happening in the current frame. 
Each position k in the scene vector 
(s},..., 5®,...,s®) corresponds to a 
different event class (e.g., a person 
walking) and the value s* is the num- 
ber of detected instances of class k at 
time ¢. [GX11:7.2] 


SCERPO: Spatial correspondence, evi- 


dential reasoning and perceptual orga- 
nization. A well-known vision system 
developed by David Lowe that demon- 
strated recognition of complex polyhe- 
dral objects (e.g., razors) in a complex 
scene. [Low85] 


screw motion: A 3D transformation 


comprising a rotation about an axis a 
and translation along a. The general 
Euclidean transformation x > Rx+f 
is a screw transformation if Rf =f. 
[Nal93:8.2.1] 


search tree: A data structure that records 


the choices that could be made in a 
problem-solving activity, while search- 
ing through a space of alternative 
choices for the next action or deci- 
sion. The tree could be explicitly 
created or could be implicit in the 
sequence of actions. For example, a 
tree that records alternative model-to- 
data feature matching is a specialized 
search tree used for interpretation tree 
searches. If each non-leaf node has 
two children, we have a binary search 
tree. See also decision tree and tree 
classifier. [DH73:12.4.1] 


SECAM: Sequential Couleur avec 


Mémoire is the television broadcast 
standard in France, the Middle East 
and most of Eastern Europe. SECAM 
broadcasts 819 lines per second. It is 
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one of three main television standards 
throughout the world, the other two 
being PAL (see PAL camera) and NTSC. 
[Jai89:4.1] 


second-derivative operator: A linear 


filter estimating the second derivative 
from an image at a given point and ina 
given direction. Numerically, a simple 
approximation of the second deriva- 
tive of a 1D function f is the cen- 
tral (finite) difference, derived from the 
Taylor approximation of f: 


" fimi — 2fi + fi~ 
pon 


where 4 is the sampling step (assumed 
constant) and O(Þ) indicates that the 
truncation error vanishes as 4. A sim- 
ilar but more complicated approxima- 
tion exists for estimating the second 
derivative in a given direction in an 
image. See also first derivative filter. 
[JKS95:5.3] 


second fundamental form: See surface 
curvature. 


+ O(h) 


seed region: The initial region used 


in a region-growing process, such as 
surface fitting in range data or intensity 
region finding in an intensity image. 
The figure shows a patch on a sur- 
face that is a potential seed region 
for growing the full cylindrical patch: 
[JKS95:3.5] 


segmentation: The problem of dividing 


a data set into parts according to a 
given set of rules. The assumption 
is that the different segments corre- 
spond to different structures in the 
original input domain observed in the 
image. See image segmentation, 
color image segmentation, 
curve segmentation, motion 
segmentation, part segmentation, 
range data segmentation and texture 
segmentation. [FP03:14-14.1.2] 
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self-calibration: The problem of esti- 


mating the calibration parameters 
using only information extracted from 
a sequence or set of images (typically 
feature point correspondences in 
subsequent frames of a sequence 
or in several simultaneous views), 
as opposed to traditional calibration 
in photogrammetry, which adopts 
specially built calibration objects. 
Self-calibration is intimately related 
to the basic concepts of multi-view 
geometry. See also camera calibration, 
autocalibration, stratification and 
projective geometry. [FP03:13.6] 


self-localization: The problem of esti- 


mating the sensor’s position within 
an environment from image or video 
data. The problem can be cast as 
geometric model matching if models of 
sufficiently complex objects are avail- 
able, i.e., containing enough points 
to allow a full solution of the pose 
estimation problem. In some situa- 
tions it is possible to identify a suffi- 
cient number of landmark points (see 
landmark detection). If no informa- 
tion at all is available about the scene, 
one can still apply tracking or optical 
flow techniques to get correspond- 
ing points over time, or stereo cor- 
respondences in multiple simultane- 
ous frames. See also motion estimation, 
egomotion and stereo correspondence 
problem. [Ols00] 


self-occlusion: Occlusion in which part 


of an object is occluded by another part 
of the same object. In the figure, the 
left leg of the person is occluding their 
right leg: [DF99] 


self-similarity matrix: Given a set of 


objects {0), 02,...0,} described by a 
set of property vectors {p,, P2, ... Pn} 


then one can define a self-similarity 
matrix [M;;] where each entry M;; is 
defined by some user selected sim- 
ilarity function sim(p;, p;) between 
objects o; and o;. The Euclidean 


semantic image segmentation: A 


form of image segmentation, usually 
into regions that are simultaneously 
extracted and labeled with their object 
category or identity. This approach 


distance between the vectors p; and 
P; is one of many possible similarity 
metrics. Given the matrix, one can 
do different things, such as clustering 
similar objects together. See spectral 
clustering. [MKB79:13.4] 


SEM: See scanning electron microscope. 


semantic gap: The difference between 
two different representations of an 
object. For example, one could 
describe a square as a set of four 
equal-length lines at right angles or 
using a specific collection of pixels in 
a binary image. While both describe 
a square, it would require a lot of 
computation to demonstrate that the 
two descriptions are largely equiva- 
lent. [WP:Semantic_gap] 


semantic image annotation and 
retrieval: Image retrieval from a 
database based on symbolic descrip- 
tors, e.g., keywords that describe the 
image, which have been inferred from 
descriptions of the image data. For 
example, instead of describing the fig- 
ure in terms of, e.g., color histograms, 
one can use the histograms and other 
information to infer the presence of a 
car, people, bicycles etc.: 


The descriptors could have a prob- 
ability reflecting the certainty that 
the description actually holds for the 
image. [CV05a] 


exploits visual context as well as spe- 
cific object visual appearance proper- 
ties and relationships and is in con- 
trast to segmentation algorithms that 
only use image properties. For exam- 
ple, segmenting and identifying a road 
sign is easier in the context of an out- 
door road scene than in a cluttered 
indoor room. [SWRC06] 


semantic net: A graph representation 


in which nodes represent the objects 
of a given domain and arcs rep- 
resent the properties and relations 
between objects. See also symbolic 
object representation, graph model 
and relational graph. The figure shows 
an arch and its semantic net represen- 
tation: [BB82:10.2] 
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semantic primitive: A meaningful 


“thing”, such as an object in a scene 
or image or an instantaneous action in 
a video (e.g. a jumping action). By con- 
trast, a collection of pixels probably 
would not be considered a semantic 
primitive, unless they formed a recur- 
ring pattern. A more abstract seman- 
tic primitive might be a cluster of 
similar feature vectors, in which case 
the semantic primitive is the cluster, 
which may not correspond to an entity 
recognizable by a human. [KTF02] 


semantic region: 1) A region in 


an image that corresponds to some 
semantic primitive, e.g. a nameable 
pattern or object. 

2) An image region that participates in 
multiple behaviors. [WG08] 


semantic region growing: A region 


merging scheme incorporating 
a priori knowledge about adjacent 
regions, e.g., in aerial imagery of 
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semi-supervised 


countryside areas, the fact that roads 
are usually surrounded by fields. Con- 
straint propagation can then be applied 
to achieve a globally optimal region 
segmentation. See also constraint 
satisfaction, relaxation labeling, 
region segmentation, region-based 
segmentation and recursive region 
growing. [BB82:5.5] 


semantic scene segmentation: See 


semantic image segmentation. 


semantic scene understanding: A 


concept related to semantic image 
segmentation but could be slightly 
wider to allow multiple images or 
video, with the similar goals of isolat- 
ing the distinct objects in the image 
data and recognizing their type. An 
associated goal could be to recognize 
and label everything in the data. See 
also image segmentation and object 
recognition. 


semantic texton forest: A forest is an 


ensemble (set) of decision trees, where 
the leaf nodes contain a distribution 
of potential labels for the input struc- 
ture under consideration. The decision 
result comes from averaging the partial 
results of each tree in the ensemble. 
A variant of the forest is the random 
forest which uses different randomly 
generated tests at each splitting node 
in each tree. The texton extension is to 
use a function of the values of one or 
a pair of pixels in an image patch. One 
common application of the semantic 
texton forest is to do semantic image 
segmentation. [SJC08] 


semantic video indexing: Video 


indexing based on conceptual units 
such as words, image patches or 
video clips illustrating the desired 
video content. Contrast with using 
collections of numerical properties 
such as color histograms. [NHO1] 


semantic video search: See semantic 
video indexing. 


learning: In 
supervised learning the dataset 
contains a number of input-output 
pairs. In semi-supervised learning, the 
learner is given more examples for 
which only the input is available (these 
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examples are unlabeled). The goal is 
to produce improved performance on 
the unlabeled examples by exploiting 
information in the labeled examples. 
(Mur12:1.5.3] 


sensitivity: A binary classifier cx) 


returns + or — labels for an example 
x. Comparing these predictions to the 
actual label gives rise to a true positive 
(TP), true negative (TN), false positive 
(FP) or false negative (FN). The sen- 
sitivity is defined as the true positive 
rate, i.e., TP/(TP + FN), or the percent- 
age of true examples that are correctly 
labeled. The term is mainly used in 
medical contexts. See also specificity. 
[HTF08:9.2] 


sensor: A general word for a mecha- 


nism that records information from 
the “outside world”, generally for pro- 
cessing by a computer. The sensor 
might obtain raw measurements, e.g., 
a video camera, or partially processed 
information, e.g., depth from a stereo 
triangulation process. [BM02:1.9] 


sensor fusion: A vast class of tech- 


niques aiming to combine the dif- 
ferent information contained in data 
from different sensors, in order to 
achieve a richer or more accurate 
description of a scene or action. 
Among the many paradigms for fus- 
ing sensory information are the Kalman 
filter, Bayesian statistical models, fuzzy 
logic, Dempster-Shafer evidential rea- 
soning, production systems and neural 
networks. [WP:Sensor_fusion] 


sensor modeling: The process of char- 


acterizing the capabilities of a sensor, 
such as its optical, spatial and temporal 
frequency response. 


sensor motion compensation: A class 


of techniques aiming to suppress the 
motion of a sensor (or its effects) in 
a video sequence or in data extracted 
from the sequence. A typical exam- 
ple is image sequence stabilization, 
in which a target moving across the 
image in the original sequence appears 
stationary in the output sequence. 
Another example is keeping a robot 
stationary in front of a target using 
only visual data (known as “station 
keeping”). Suppression of jitter in 


sensor 


hand-held video recorders is now com- 
mercially available. Basic ingredients 
are tracking and motion estimation. 
See also egomotion. [SGB+00] 


motion estimation: See 


egomotion. 


sensor network: A collection of sen- 


sors connected through a communi- 
cation channel, which may communi- 
cate raw data or processed results. The 
data could be low level, such as tem- 
perature, brightness or raw video, or 
higher-level objects or events, such as 
recognized objects (e.g., car number 
plates) or the counts of people pass- 
ing. The sensors could communicate 
individually to a base station or for- 
ward results to neighboring sensors, 
which ultimately reach a base station. 
[WP:Wireless_sensor_network] 


sensor path planning: See sensor 
planning. 


sensor placement determination: 
See camera calibration and sensor 
planning. 


sensor planning: A class of techniques 
aimed at determining the optimal sens- 
ing strategies for a reconfigurable sen- 
sor system, normally given a task and a 
geometric model of the target object 
(which may be partially acquired in 
previous views). For example, given 
a geometric feature on an object for 
which a CAD-like model is known and 
the task of verifying the feature’s size, 
a sensor-planning system would deter- 
mine the best position and orienta- 
tion of, say, a single camera and asso- 
ciated illumination for estimating the 
size of each feature. The two basic 
approaches have been generate-and- 
test, in which sensor configurations 
are generated and then evaluated with 
respect to the task constraints, and 
synthetic methods, in which task con- 
straints are characterized analytically 
and the resulting equations solved to 
yield the optimal sensor configuration. 
See also active vision and purposive 
vision. [TAT95] 


sensor position estimation: See pose 


estimation. [WP:Pose_(computer_ 
vision)#Pose_Estimation] 


sensor response: The output of a sen- 


sor, or a characterization of some 
key output quantities, given a set of 
inputs. Typically expressed in the fre- 
quency domain, as a function linking 
the magnitude and phase of the Fourier 
transform of the output signal with the 
known frequency of the input. See also 
phase spectrum, power spectrum and 
spectral response. [TH94] 


sensor sensitivity: In general, the weak- 


est input signal that a sensor can 
detect. It can be inferred from the 
sensor response curve. For the com- 
mon CCD sensor of video cameras, sen- 
sitivity depends on various parameters, 
mainly the fill factor (the percentage 
of the sensor’s area actually sensitive 
to light) and the well capacity (the 
amount of charge that a photosensitive 
element can hold). The larger the val- 
ues of the above parameters, the more 
sensitive the camera. See also sensor 
spectral sensitivity. [FHD06] 


sensor spectral sensitivity: A charac- 


Strength (A.U.) 


terization of a sensor’s response in fre- 
quency. The figure shows the spec- 
tral sensitivity of a typical CCD sen- 
sor (actually its spectral response, from 
which the spectral sensitivity can be 
inferred): 
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Notice that the high sensitivity of sil- 
icon in the infrared means that IR- 
blocking filters should be considered 
for fine measurements, depending on 
camera intensities. We also notice that 
a CCD camera makes a very good sen- 
sor for the near-infrared range (750- 
3000 nm). [WP:Spectral_sensitivity] 


separability. A term used in 
classification problems referring 


to whether the data is capable of 
being split into distinct subclasses by 
some automatic decision process. If 
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property values of two classes overlap, 
then the classes are not separable. 
The figure shows a linearly separable 
circle class and inseparable x and box 
classes: 
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[WP:Linear_separability] 


separable filter: A 2D Gn image pro- 
cessing) filter that can be expressed 
as the product of two filters, each 
of which acts independently on rows 
and columns. The classic example is 
the linear Gaussian filter (see Gaussian 
convolution). Separability implies a sig- 
nificant reduction in computational 
complexity, typically reducing pro- 
cessing costs from O(N”) to O(2N), 
where N is the filter size. See also linear 
filter and separable template. [Nar81] 


separable template: A template or 
structuring element in a filter, e.g., a 
morphological filter (see morphology), 
that can be decomposed into a 
sequence of smaller templates, similar 
to separable kernels for linear filters. 


brightness distribution caused by 
nonuniform illumination. All tech- 
niques assume a shading model, 
ie., a photometric model of image 
formation, formalizing the depen- 
dency of the measured image 
brightness on camera parameters 
(typically gain and offset), illumination 
and reflectance. See also shadow and 
photometry. [TB04] 


shading from shape: A technique 


recovering the reflectance of isolated 
objects given a single image and a 
geometric model, but not exactly the 
inverse of the classic shape from 
shading problem. See also photometric 
stereo. [VV90] 


shadow: Part of a scene that direct illu- 


mination does not reach because of 
self-occlusion (attached shadow or self- 
shadow) or occlusion caused by other 
objects (cast shadow): 
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This region appears darker than its 
surroundings. See also shape from 
shading, shading from shape and 
photometric stereo. [FP03:5.3.1] 


The main advantage is a reduction 
in the computational complexity of 
the associated filter. See also separable 
filter. [Gad91] 


set theoretic modeling: See 
constructive solid geometry. 


shading: The pattern formed by the 
graded areas of an intensity image, 
suggesting light and dark. Varia- 
tions in the lightness of surfaces 
in the scene may be caused by 
variations in illumination, surface 
orientation and surface reflectance. 
See also illumination and shadow. 
[Hor86:10.10] 


shading correction: A class of tech- 
niques for changing undesirable 
shading effects, e.g., strongly uneven 
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shadow detection: The problem of 


identifying image regions correspond- 
ing to shadows in the scene, 
using photometric properties. Use- 
ful for true color estimation and 
region analysis. See also color, color 
image segmentation, color matching, 
photometry and region segmentation. 
[HHD99] 


shadow type labeling: A problem sim- 


ilar to shadow detection, but requir- 
ing classification of different types of 
shadow. [CAKO7] 


shadow understanding: Estimating var- 


ious properties of a 3D scene, 
e.g., building height, based on the 
appearance or size of shadows. See also 
shadow type labeling. [GDH11] 


shape: Informally, the form of an 
image or scene object. Typically 
described in computer vision through 
geometric representations (see 
shape representation), e.g., modeling 
image contours with polynomials 
or b-splines, or range data patches 
with quadric surfaces. More formal 
definitions are: 

1. adj) The quality of an object 
that is invariant to changes of the 
coordinate system in which it is 
expressed. If the coordinate sys- 
tem is a Euclidean space, this corre- 
sponds to the conventional idea of 
shape. In an affine coordinate sys- 
tem, the change of coordinates may 
be affine, so that, e.g., an ellipse and 
a circle have the same shape. 

2. (n) A family of point sets, any pair 
being related by a coordinate sys- 
tem transformation. 

3. @ A specific set of n-dimensional 
points, e.g., the set of squares. For 
example, a curve in R? defined 
parametrically as CD) = x, yO) 
comprises the point set or shape 
{CD | — co < t < oo}. The volume 
inside the unit sphere in 3D 
is the shape {x | ||x|| < 1, x € R3}. 
[ZRHO3:2.3] 


shape class: One in a set of classes 
representing different types of shape 
in a given classification, e.g., “locally 
convex” or “hyperbolic” in mean and 
Gaussian curvature shape classification 
of a range image. [TV98:4.4.1] 


shape context: A descriptor for 2D 
image shapes, based on the distri- 
bution of vectors between pairs of 
points. The distribution uses a 2D 
histogram whose bins have a log-polar 
spatial quantization. Taking each point 
Pp on the shape in turn, the histogram 
records the number of other points ¢ 
also in the shape, whose relative posi- 
tion vector (4 — P) lies in each bin. The 
histogram is a 2D shape descriptor that 
can be compared, e.g., by using a dot 
product. [BMP02] 


shape decomposition: See segmenta- 
tion and hierarchical modeling. 


shape descriptor: One of a fam- 
ily of numerical descriptors that 


characterize the shape of an object. 
For example, when describing a closed 
region, one might use area, moment 
invariants, shape context, Fourier 
shape descriptors etc., all of which 
characterize some aspect of the shape 
of the region. Other descriptors are 
possible for curves, volumes, symbols, 
trademarks etc. [BB82:Ch. 8] 


shape from contour: A class of algo- 


rithms for estimating the shape of a 3D 
object from the contour it generates 
in an image. A well-known technique, 
shape from silhouette, consists of 
extracting the object’s silhouette from 
a number of views and intersecting 
the 3D cones generated by the sil- 
houettes’ contours and the centers of 
projections. The intersection volume 
is known as the visual hull. Work also 
exists on understanding shape from 
the differential properties of apparent 
contours. [Koe84] 


shape from defocus: A class of algo- 


rithms for estimating scene depth 
at each image pixel, and therefore 
surface shape, from multiple images 
acquired at different, controlled focus 
settings. A closed-form model of the 
relation between depth and image 
focus is assumed, containing a number 
of parameters (e.g., the optics parame- 
ters) that must be calibrated. Depth is 
estimated using this model once image 
readings (pixel values) are available. 
Notice that the camera uses a large 
aperture, so that the points in the scene 
are in focus over the smallest possi- 
ble depth interval. See also shape from 
focus. [Kro87] 


shape from focus: A class of algo- 


rithms for estimating scene depth at 
each image pixel, and therefore surface 
shape, by varying the focus setting of 
a camera until the image achieves opti- 
mal focus (minimum blur) in a neigh- 
borhood of the pixel under exami- 
nation. Obviously, pixels correspond- 
ing to different depths would achieve 
optimal focus for different settings. A 
model of the relation between depth 
and image focus is assumed, contain- 
ing a number of parameters (e.g., the 
optics parameters) that must be cal- 
ibrated. Notice that the camera uses 
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a large aperture, so that the smallest 
possible depth interval generates in- 
focus image points. See also shape 
from defocus. [Kre83] 


shape from interreflection: Standard 


shape from shading or photometric 
stereo algorithms assume that the sur- 
faces are illuminated by a single point 
source. When surfaces are nearby, they 
are also illuminated by interreflections 
from other surfaces, which leads to 
errors in the recovered shape. By mod- 
eling how the interreflections occur, 
algorithms can iteratively remove the 
effect of the interreflections, and thus 
lead to better estimates of the true 
shape. [NIK90] 


shape from line drawings: A class of 


symbolic algorithms inferring 3D prop- 
erties of scene objects (as opposed to 
exact shape measurements, as in other 
“shape from” methods) from line draw- 
ings. First, assumptions are made about 
the type of line drawings admissible, 
e.g., polyhedral objects only, no sur- 
face markings or shadows, maximum 
three lines forming an image junction. 
Then, a dictionary of line junctions is 
formed, assigning a symbolic label to 
every possible appearance of the line 
junctions in space under the given 
assumptions. The figure shows part of 
a simple dictionary of junctions and a 
labeled shape: 
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where + means planes intersecting in 
a convex shape, — means a concave 
shape, and the arrows a discontinu- 
ity (occlusion) between surfaces. Each 
image junction is then assigned the 
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set of all possible labels that its shape 
admits locally (e.g., all possible two- 
line junction labels for a two-line junc- 
tion). Finally, a constraint satisfaction 
algorithm is used to prune labels incon- 
sistent with the context. See also Waltz 
line labeling and relaxation labeling. 
[CSD+09] 


shape from monocular depth cues: 
A class of algorithms estimating shape 
from information related to depth 
detected in a single image, i.e., from 
monocular cues. See shape from 
contour, shape from line drawings, 
shape from perspective, shape from 
shading, shape from specularity, shape 
from structured light and shape from 
texture. [HJ93] 


shape from motion: A vast class of 
algorithms for estimating 3D shape 
(structure), and often depth, from the 
motion information contained in an 
image sequence. Methods exist that 
rely on tracking sparse sets of image 
features (e.g., the Tomasi-Kanade 
factorization) as well as dense motion 
fields, i.e., optical flow, seeking to 
reconstruct dense surfaces. See also 
motion factorization. [JKS95:11.3] 


shape from multiple sensors: A class 
of algorithms recovering shape from 
information collected from a num- 
ber of sensors of the same type (see 
multi-view stereo) or of different types 
(see sensor fusion). [ME88] 


shape from optical flow: See optical 
flow. 


shape from orthogonal views: See 
shape from contour. 


shape from perspective: A class of 
techniques estimating depth for var- 
ious features from perspective cues, 
e.g., the fact that a translation along 
the optical axis of a perspective 
camera changes the size of the imaged 
objects. See also pinhole camera 
model. [SQ04:9A.2.1] 


shape from photo consistency: A 
technique based on space carving for 
recovering shape from multiple views 
(photos). The basic constraint is that 
the underlying shape must be “photo- 
consistent” with all the input photos, 


i.e., roughly speaking, it must give rise 
to compatible intensity values in all 
cameras. [KSOO] 


shape from photometric stereo: See 
photometric stereo. 


shape from polarization: A technique 
recovering local shape from the polar- 
ization properties of a surface under 
observation. The basic idea is to illumi- 
nate a surface with known polarized 
light, estimate the polarization state 
of the reflected light, then use this 
estimate in a closed-form model link- 
ing the surface normals with the mea- 
sured polarization parameters. In prac- 
tice, polarization estimates can be 
noisy. This method can be useful wher- 
ever intensity images do not provide 
information, e.g., featureless specular 
surfaces. See also polarization-based 
methods. [Wol92] 


shape from scatter trace: A method 
of shape recovery from transparent or 
translucent objects by combining mul- 
tiple observations of a point with a 
moving light source (or a moving cam- 
era, or both). The set of measurements 
at each pixel is its scatter trace. [MK07] 


shape from shading: The problem of 
estimating shape, here in the sense of 
a field of normals from which a sur- 
face can be recovered up to a scale 
factor, from the shading pattern (light 
and shadows) of an image. The key 
idea is that, assuming a reflectance map 
for the scene (typically a Lambertian 
surface), an image irradiance equation 
can be written linking the surface nor- 
mals to the illuminant direction and the 
image intensity. The constraint can be 
used to recover the normals assuming 
local surface smoothness. [JKS95:9.4] 


shape from shadows: A technique for 
recovering geometry from a number of 
images of an outdoor scene acquired 
at different times, i.e., with the sun 
at different angles. Geometric infor- 
mation can be recovered under vari- 
ous assumptions and knowledge of the 
sun’s position. Also called “shape from 
darkness”. See also shape from shading 
and photometric stereo. [CL89] 


shape from silhouette: See shape from 


contour. 


shape from specularity: A class of algo- 


rithms for estimating local shape from 
surface specularities. A  specularity 
constrains the surface normal - the 
incident and reflection angles must 
coincide. The detection of speculari- 
ties in images is, in itself, a non-trivial 
problem. [HB88] 


shape from structured light: See 


structured light triangulation. 


shape from texture: The problem of 


estimating shape, here in the sense of 
a field of normals from which a sur- 
face can be recovered up to a scale fac- 
tor, from the image texture. The defor- 
mation of a planar texture recorded 
in an image (the texture gradient) 
depends on the shape of the surface 
to which the texture is applied. Tech- 
niques exist for shape estimation from 
statistical texture and regular texture 
patterns. [FP03:9.4-9.5] 


shape from X: A generic term for a 


method that generates 3D shape or 
position estimates from one of a vari- 
ety of possible techniques, such as 
shape from multiple sensors, shape 
from shading, shape from focus etc. 
[TV98:9.1] 


shape from zoom: The problem of com- 


puting shape (in the sense of the dis- 
tance of each scene point from the sen- 
sor) from two or more images acquired 
at different zoom settings, achieved 
through a zoom lens. The basic idea 
is to differentiate the projection equa- 
tions with respect to the focal length, 
f, achieving an expression linking the 
variations of f and pixel displacement 
with depth. [MO90] 


shape grammar: A grammar specifying 


a class of shapes, whose rules spec- 
ify patterns for combining more primi- 
tive shapes. Rules are composed of two 
parts: a description of a specific shape 
and how to replace or transform it. 
Used also in design, CAD and architec- 
ture. See also production system and 
expert system. [BB82:6.3.2] 


shape index: A measure, usually indi- 


cated by S, of the type of shape of a 
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surface patch in terms of its principal 
curvature. Formally, 


Ky + Km 


2 
S= arctan 
T Km — Km 
where Km and Ky are the principal cur- 
vatures. $ is undetermined for planar 
patches. A related parameter, R, mea- 
sures the amount of curvedness of the 


patch: 
y Kin + «5/2 


All curvature-based shape classes map 
to the unit circle in the R-S plane, 
with planar patches at the ori- 
gin. See also mean and Gaussian 
curvature shape classification and 
shape representation. [KvD92] 


shape magnitude class: Part of a 
local surface curvature representation 
scheme in which each point has a 
surface class and a magnitude of cur- 
vature (shape magnitude). This repre- 
sentation is an alternative to the more 
common shape classification based on 
either the two principal curvatures 
or the mean and Gaussian curvature. 
[KvD92] 


shape matching: Matching could be at 
a high level, in which one is compar- 
ing semantic descriptions in terms of 
the different parts of the shapes (e.g., 
matching a mountain bicycle to a rac- 
ing bicycle), or at a low level, where 
one establishes point-to-point corre- 
spondence between the shapes (e.g., 
when matching faces). [BMP02] 


shape modeling: Constructing some 
sort of compact representation of 
a shape, through, e.g., geometric 
modeling for exact models or point 
distribution models for modeling a fam- 
ily of shapes. Usually the model is a 
somewhat simplified or abstracted rep- 
resentation of the real object. 


shape moment: A moment as applied to 
a 2D region or 3D volume. 


shape prior: A piece of domain knowl- 
edge that helps constrain the space of 
all possible shapes, e.g., as used dur- 
ing shape recognition. Examples could 
be property based, such as a prior that 
tries to maximize the smoothness of a 
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boundary, or probabilistic, such as a 
distribution of possible shapes (shape 
parameters). [GCK91] 


shape recognition: Recognizing either 


the class of a shape, e.g., that it is a star 
shape or is “volcano-shaped”, or the 
specific shape it has, e.g., a particular 
trademark’s shape. [MZS03] 


shape recovery: Reconstructing the 3D 


shape of an object from image data. 
There are many methods for this. See 
shape from X. 


shape representation: A large class 


of techniques seeking to capture the 
salient properties of shapes, both 2D 
and 3D, for analysis and comparison 
purposes. Many representations have 
been proposed in the literature, 
including skeletons for 2D and 3D 
shapes (see medial axis skeletonization 
and distance transform), curvature- 
based representations (e.g., the 
curvature primal sketch, the curvature 
scale space and the extended 
Gaussian image), generalized cones 
for articulated objects, invariants, 
and flexible object models (e.g., 
snakes, deformable _superquadrics 
and deformable template models). 
[ZRHO3:2.3] 


shape template: A geometric pattern 


used for matching with the image 
data, e.g., by correlation matching. The 
shape could be rigid or parameter- 
ized. The template is usually swept 
over a region of the image (or sub- 
jected to Fourier matched-filter object 
recognition). In the figure, the tem- 
plate is the letter “C” which is matched 
against the text image: [ZDDD07] 
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shape texture: The texture of a surface 
from the point of view of the varia- 
tion in the shape, as contrasted to the 
variation in the reflectance patterns on 
the surface. See also surface roughness 
characterization. [WY08] 


shapeme histogram: A shapeme is a dis- 
tinctive cluster of shape features on 
the boundary of a 2D shape or sur- 
face of a 3D shape, giving a 2D shape 
descriptor or 3D shape descriptor. The 
shapeme histogram records a count of 
the instances of the different shapemes 
on the object. Object recognition is 
based on matching histograms. The fig- 
ure shows (left) an L-shape with three 
shapemes consisting of a convex cor- 
ner, a concave corner and a straight 
section and (right) its the shapeme his- 
togram: [SSMK06] 
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sharp—unsharp masking: A form of 
image enhancement that makes the 
edges of image structures crisper. The 
operator can either add a weighted 
amount of a gradient or high-pass 
filter of the image or subtract a 


weighted amount of a smoothing filter 
or low-pass filter of the image. The fig- 
ure shows (left) an image and (right) 
an unsharp masked version of it: 
[Umb98:4.3] 


shear transformation: An affine image 
transformation changing one coordi- 
nate only. The corresponding transfor- 
mation matrix, S, is equal to the iden- 
tity apart from s,2 = Sx, which changes 


the first image coordinate. Shear on the 
second image coordinate is obtained 
similarly by s21 = s,. The figure shows 
the result of a shear transformation: 
[SQ04:9.1] 


— 


shock graph: A graph description of the 
medial axis of a 2D planar shape. The 
four node types are based on the radius 
function along the axis (1=monotonic, 
2=local minimum radius, 3=constant, 
4=local maximum radius). The graph 
can be organized into a tree for 
efficient object recognition by graph 
matching. The figure shows a simple 
shape with its overlaid medial axis and 
corresponding shock graph: [SSDZ99] 


shock tree: A 2D shape representation 
technique based on the singularities 
(see singularity event) of the radius 
function along the medial axis (MA). 
The MA is represented by a tree with 
the same structure, and is divided 
into continuous segments of uni- 
form behavior (local maximum, local 
minimum, constant, monotonic). See 
also medial axis skeletonization and 
distance transform. [SSDZ99] 


short baseline stereo: See narrow 
baseline stereo. 


shot noise: See impulse noise and 
salt-and-pepper noise. 


shutter: A device allowing the light into 
a camera for enough time to form 
an image on a photosensitive film 
or chip. Shutters can be mechanical, 
as in traditional photographic cam- 
eras, or electronic, as in a digital 
camera. In the former case, a window- 
like mechanism is opened to allow 
the light to be recorded by a pho- 
tosensitive film. In the latter case, 
a CCD or other type of sensor is 
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triggered electronically to record the 
amount of incident light at each pixel. 
[WP:Shutter_(photography)] 


shutter control: The device controlling 


the length of time that the shutter 
is open. [WP:Exposure_(photography) 
#Exposure_control] 


side-looking radar: A radar projecting a 
fan-shaped beam illuminating a strip of 
the scene at the side of the instrument, 
typically used for mapping a large area. 
The map is produced as the instrument 
is carried along by a vehicle sweep- 
ing the surface to the side. See also 
acoustic sonar. [Leb79] 


SIFT: A feature point descriptor that 
aims to give a distinctive signature 
for the pattern of intensity values in 
a 16x 16 neighborhood around the 
feature point. The descriptor is com- 
puted from eight cell histograms of 
the gradient magnitudes and directions 
from 4 x 4 blocks within the 16 x 16 
pixel neighborhood. The histograms 
are concatenated to form a 128 vector. 
[Low04] 


signal coding system: A system for 


encoding a signal into another, typi- 
cally for compression or security pur- 
poses. See image compression and 
digital watermarking. [CSEOO] 


signal processing: The collection of 


mathematical and computational tools 
for the analysis of typically 1D (but 
also 2D, 3D etc.) signals such as 
audio recordings or other intensity 
against time or position measurements. 
Digital signal processing is the sub- 
set of signal processing which per- 
tains to signals that are represented as 
streams of binary digits. [WP:Signal_ 
processing] 


signal-to-noise ratio (SNR): A measure 


of the relative strengths of the inter- 
esting and uninteresting (noise) parts 
of a signal. In signal processing, SNR 
is usually expressed in decibels as 
the ratio of the power of signal and 
noise, i.e., 10 log), P With statistical 
noise, the SNR can be defined as 10 
times the log of the ratio of the stan- 
dard deviations of signal and noise. 
[Jai89:3.6] 
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signature curve: Consider a smooth pla- 


nar curve such as the boundary of a 
region found by region segmentation, 
and parameterize that curve by arc 
length s. Let x(s) be the curvature 
of the curve and x,(s) be its deriva- 
tive with respect to arc length. Then 
(k(S), Ks(S)) gives the Euclidean sig- 
nature curve. Extensions exist for an 
affine signature curve using the affine 
curvature and affine arc length. A sig- 
nature curve can be a useful shape 
descriptor for object recognition of 
planar shapes. [SLO5] 


signature identification: A class 


of techniques for verifying a writ- 
ten signature. Also known as 
“dynamic signature verification”. 
An area of biometrics. See also 
handwriting verification, handwritten 
character recognition, fingerprint 
identification and face identification. 
[WP:Handwriting recognition] 


signature verification: The prob- 


lem of authenticating a signature 
automatically with image-processing 
techniques; in practice, deciding 
whether a signature matches a 
specimen sufficiently well. See 
also handwriting verification and 
handwritten character recognition. 
[WP:Handwriting_recognition] 


silhouette: See object contour. 


SIMD: See single instruction multiple 


data. 


similarity: The property that makes 


two entities (mages, models, objects, 
features, shape, intensity values etc.) 
or sets thereof similar, that is, how 
they resemble each other. A similarity 
transformation creates perfectly simi- 
lar structures and a similarity metric 
quantifies the degree of similarity 
of two possibly non-identical struc- 
tures. Examples of similar structures 
are two polygons identical except 
for a change in size; two image 
neighborhoods whose intensity values 
are identical except for scaling by 
a multiplicative factor. The concept 
of similarity lies at the heart of sev- 
eral classic vision problems, including 
the stereo correspondence problem, 


image matching and geometric model 
matching. [JKS95:14.3] 


similarity metric: A metric quantify- 
ing the similarity of two entities. 
For instance, cross correlation is a 
common similarity metric for image 
regions. For similarity metrics on spe- 
cific objects encountered in vision, 
see feature similarity, graph similarity 
and gray scale similarity. See also 
point similarity measure and matching 
method. [DH73:6.7] 


similarity transformation: A 
transformation that changes an 
object into a similarlooking one; 
formally, a conformal mapping pre- 
serving the ratio of distances (the 
magnification ratio). The transfor- 
mation matrix, T, can be written as 
T = B“!AB, where A and B are similar 
matrices, that is, representing the 
same transformation after a change 
of basis. Examples include rotation, 
translation, expansion and contraction 
(scaling). [SQ04:9.1] 


simple lens: A lens composed by a single 
piece of refracting material, shaped in 
such a way as to achieve the desired 
lens behavior. For example, a convex 
focusing lens. [Hor86:2.3] 


simulated annealing: Simulated anneal- 
ing is a generic heuristic method 
for the optimization of an objective 
function E(x). We consider minimiza- 
tion of E(x), corresponding to the 
physics origin of the problem, where 
a minimum energy configuration is 
desired. Under the Boltzmann distribu- 
tion, the probability of configuration 
x is given by p(x) x exp(—E(@)/T), 
where T is the temperature (and Boltz- 
mann’s constant has been set to 1). At 
very high temperatures the probabil- 
ity distribution over the states is uni- 
form, but at 7 = 0 the state(s) with the 
minimum value of the objective have 
all of the probability mass. The algo- 
rithm works by proposing a change 
x’ to the current state and accept- 
ing this change depending on (E@’) — 
E(x))/T) with greater propensity to 
accept downhill changes. (The algo- 
rithm gives a nonzero probability of 
accepting “uphill” moves, which helps 


it avoid the local minima of greedy 
search.) This process is run while grad- 
ually decreasing (or “annealing”) the 
temperature to zero. [PTVF92:10.9] 


single-camera system: A vision system 


that uses only one camera. Contrast 
with a multi-camera system. 


single instruction multiple data 


(SIMD): A computer architecture 
allowing the same instruction to be 
simultaneously executed on multiple 
processors and thus different portions 
of the data set (e.g., different pix- 
els or image neighborhoods). Useful 
for a variety of low-level image pro- 
cessing operations. See also MIMD, 
pipeline parallelism, data parallelism 
and parallel processing. [Sch89:Ch. 8] 


single-layer perceptron network: A 


form of neural network used for 
supervised learning problems. It maps 
input data x of dimension d to a 
space of outputs » of dimension d'. 
A single-layer perceptron network is 
characterized by a d’ x d matrix of 
weights (parameters) W and a trans- 
fer function o so that f(@ = o (WX) 
where o is applied element-wise to 
vector arguments. Each output unit is 
a perceptron, with o being its activa- 
tion function; the logistic sigmoid func- 
tion o (D = (1+ e757! is a common 
choice. See also multilayer perceptron 
network. [Bis06:5.1] 


single-lens reflex camera: A camera 


that uses a mirror system to allow the 
photographer to see what image will 
be captured. These cameras are pop- 
ular for quality photography because 
of the ability to interchange lenses and 
filters. [WP:Single-lens_reflex_camera] 


single photon emission computed 


tomography (SPECT): A medical 
imaging technique that involves the 
rotation of a photon detector array 
around the body in order to detect 
photons emitted by the decay of pre- 
viously injected radionuclides. This 
technique is particularly useful for 
creating a volumetric image showing 
metabolic activity. Resolution is lower 
than PET but imaging is cheaper and 
some SPECT  radiopharmaceuticals 
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may be used where PET nuclides 
cannot. [WP:Single-photon emission 
computed tomography] 


singular value decomposition (SVD): 


A factorization of any m x n matrix A 
into A = UDV”. The columns of the 
mx m matrix U are mutually orthog- 
onal unit vectors, as are the columns 
of the nxn matrix V. The mxn 
matrix D is diagonal, and its nonzero 
elements, the singular values o;, sat- 
isfy o1 > 02 >...> On > 0. The SVD 
has extremely useful properties. For 
example: 

e A is nonsingular if and only if all its 
singular values are nonzero and the 
number of nonzero singular values 
gives the rank of A; 

e the columns of U corresponding to 
the nonzero singular values span 
the range of A; the columns of V 
corresponding to the nonzero sin- 
gular values span the null space 
of A; 

e the squares of the nonzero singular 
values are the nonzero eigenvalues 
of both AA’ and A’A, the columns 
of U are eigenvectors of AA’ and the 
columns of V of ATA. 

Moreover, the pseudoinverse of a 

matrix, occurring in the solution of 

rectangular linear systems, can be eas- 
ily computed from the SVD definition. 

[FP03:12.3.2] 


singularity event: A point in the domain 


of the map of a geometric curve or sur- 
face where the first derivatives vanish. 
[WP:Singular_point_of_a_curve] 


sinusoidal projection: A family of lin- 


ear image transforms, C, the rows of 
which are the eigenvalues of a spe- 
cial symmetric tridiagonal matrix. This 
includes the discrete cosine transform 
(DCT). [Jai89:5.12] 


situation graph tree: A behavior 
representation for describing 
activities, including the alternative 
sequences that actions can take, used 
for video-based behavior classification. 
A graph is formed where nodes are 
based on schemas that contain logical 
predicates for recognizing a state and 
for the actions expected when in 
that state. The arcs are probabilistic 


262 


transitions between the schemas. The 
states can be expanded hierarchically, 
creating a tree of subgraphs. [GVRV04] 


situational awareness: A general psy- 


chological term referring to the abil- 
ity to perceive the important fac- 
tors in an environment in such a 
manner as to be able to make 
predictions about how the environ- 
ment will change. In the context of 
image analysis, it generally refers to 
coupling scene understanding with 
event understanding. [WP:Situation_ 
awareness] 


skeleton: A curve, or tree-like set of 


curves, capturing the basic structure 
of an object. The figure shows a linear 
skeleton for a puppet-like 2D shape: 


The curves forming the skeleton are 
typically central to the shape. Sev- 
eral algorithms exist for computing 
skeletons, e.g., the medial axis trans- 
form (see medial axis skeletonization) 
and the distance transform, for which 
the grassfire algorithm can be applied. 
[Jai89:9.9] 


skeleton by influence zones (SKIZ): 


Commonly known as the Voronoi 
diagram. [SQ04:7.3.2] 


skeleton model: An articulated model 


consisting of rigid links connected 
by joints, typically used for modeling 
humans, robots, animals, etc., where 
the rigid links represent the limbs. 
[SBSO2] 


skeletonization: A class of techniques 


that try to reduce a 2D (or 3D) binary 
image to a “skeleton” form in which 


every remaining pixel is a skeleton 
pixel, but the essential shape of the 
input image is captured. Definitions 
of the skeleton include the set of 
centers of circles bitangent to the 
object boundary and smoothed local 
symmetries. [Sch89:Ch. 6] 


sketch-based image retrieval: A type 


of image retrieval in which the 
database index is based on a user's 
sketch of the desired target. For exam- 
ple, it may be a sketch of a face, a 
mechanical part or a trademark. Part 
of what makes this retrieval difficult is 
the fact that the sketch will not be a 
faithful copy of the desired target, yet 
the expectation is that the shape of the 
sketch is relatively accurate. Typically, 
the retrieval will be based on the shape 
properties of the sketch rather than 
on the color and texture statistics, as 
in many other image-retrieval applica- 
tions. [GR95] 


skew: An error introduced in the imaging 


geometry by a non-orthogonal pixel 
grid, in which rows and columns of 
pixels do not form an angle of exactly 
90°. This is usually considered only in 
high-accuracy photogrammetry appli- 
cations. [JKS95:12.10.2] 


skew correction: A transformation 


compensating for the skew error. 
[JKS95:12.10.2] 


skew symmetry: A skew symmetric 
contour is a planar contour such that 
every straight line oriented at an angle 
@ with respect to a particular axis, 
called the “skew symmetry axis” of 
the contour, intersects the contour at 
two points equidistant from the axis: 
[BB82:9.5.4] 


AXIS 


skin color analysis: A set of tech- 


niques for color analysis applied 
to images containing skin, e.g., for 
retrieving images from a database 


(see color-based image retrieval). See 
also color, color image, color image 
segmentation, color matching and 
colorimetry. [TOS+03] 


skin color model: A statistical model 


of the appearance of human skin in 
images. Typical models might be based 
on histograms of observed pixel col- 
ors or Gaussian mixture models. The 
core underlying observation is that, 
when corrected for lightness, almost 
all human skin has a similar color, 
which is distinct from many of the 
other observed colors in scenes. Com- 
plications include variations in scene 
lighting and shadows. Skin color mod- 
els are commonly used in applica- 
tions such as face detection or online 
pornography screening. [JR99] 


SKIZ: See skeleton by influence zones. 


SLAM: Simultaneous localization and 


mapping. A vision algorithm used 
particularly by the mobile robotics 
community. It allows the incremen- 
tal construction and update of a 
geometric model by a robot as it 
explores an unknown environment. 
Given the constructed partial model, 
the robot can determine its location 
(self-localization) relative to the model. 
[TBFO5:Ch. 10] 


slant: The angle between a surface 


normal in the scene and the viewing 
direction: 


SURFACE NORMAL 


DIRECTION 


OF VIEW SLANT 


ANGLE 


, 


See also tilt and shape from texture. 
[FP03:9.4.1] 


slant normalization: A class of algo- 


rithms used in handwritten charac- 
ter recognition, transforming slanted 
cursive characters into vertical ones. 
See handwritten character recognition 
and optical character recognition. 
[EGSS99] 
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slice-based reconstruction: The recon- 
struction of a 3D object from a num- 
ber of planar slices, or sections taken 
across the object. The slice plane is typ- 
ically advanced at regular spatial inter- 
vals to sweep the working volume. 
See also tomography, computed axial 
tomography, single photon emission 
computed tomography and nuclear 
magnetic resonance. [SSPP07] 


sliding window: A common compo- 
nent of an image-processing (and 
signal-processing) algorithm whereby 
the calculation is based on a 
neighborhood or window of data 
about the current point. After the cal- 
culation is done at the given point, the 
calculation typically moves to an adja- 
cent point and the neighborhood (win- 
dow) moves (slides) to that adjacent 
point: [SOS00:7.1.3] 


SLIDING 


WINDOW 


IMAGE 


slope density function: This is the 
histogram of the tangential orienta- 
tions (slopes) of a curve or region 
boundary. It can be used to represent 
the curve shape in a manner invariant 
to translation and rotation (up to a shift 
of the density function). [BB82:8.4.5] 


small motion model: A class of math- 
ematical models representing very 
small (ideally, infinitesimal) camera- 
scene motion between frames. Used 
typically in shape from motion. See 
also optical flow. [LAT02] 


smart camera: A hardware device incor- 
porating a camera and an on-board 
computer in a single, small container, 
thus achieving a programmable vision 
system within the size of a normal 
video camera. [TV98:2.3.1] 
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smooth motion curve: The curve 


defined by a motion that can be 
expressed by smooth (that is, differen- 
tiable: derivatives of all orders exist) 
parametric functions of the image 
coordinates. Notice that “smooth” is 
often used in an intuitive sense, not 
in the strict mathematical sense above 
(clearly, an exacting constraint), as in 
image smoothing. See also motion and 
motion analysis. [AF02] 


smooth surface: A common-sense term 


for a surface with all orders of deriva- 
tives defined at each point in the sur- 
face (a C% surface). In practice, C? 
continuity, meaning that at least two 
derivatives exist at all points, is con- 
sidered smooth. [Weil2:Smooth Func- 
tion] 


smoothed local symmetries: A class 


of skeletonization algorithms, associ- 
ated with Asada and Brady. Given a 2D 
curve that bounds a closed region in 
the plane, the skeleton as computed 
by smoothed local symmetries is the 
locus of chord midpoints of bitangent 
circles. Compare the symmetric axis 
transform. The figure shows two skele- 
ton points as defined by smoothed 
local symmetries: [BA84] 


smoothing: Generally, any modifica- 


tion of a signal intended to remove 
the effects of noise. Often used to 
mean the attenuation of high spatial 
frequency components of a signal. 
As many models of noise have a 
flat power spectral density (PSD), 
while natural images have a PSD that 
decays toward zero at high spatial 
frequencies, suppressing the high 
frequencies increases the overall 
signal-to-noise ratio of the image. 
See also discontinuity preserving 
regularization, anisotropic diffusion, 


power spectrum and 
smoothing. [FP03:7.1.1] 


smoothing filter: Smoothing is often 
achieved by using the convolution 
operator with a smoothing filter 
to reduce noise or high spatial 
frequency detail. Such filters include 
discrete approximations to the sym- 
metric probability densities such as the 
Gaussian distribution, binomial distri- 
bution and uniform distribution. For 
example, in 1D, the discrete signal 
X,...X, is convolved with the kernel 
[2 4 ral to produce the smoothed signal 
Vi - - - Yn+2 in which y; = Exi + íx + 
Faji. [FP03:7.1.1] 


adaptive 


smoothness constraint: An additional 


constraint used in data interpretation 
problems. The general principle is that 
results derived from nearby data must 
themselves have similar values. Tradi- 
tional examples of where the smooth- 
ness constraint can be applied are in 
shape from shading and optical flow. 
The underlying observation that sup- 
ports this computational constraint is 
that the observed real-world surfaces 
and motions are smooth almost every- 
where. [JKS95:9.4] 


snake: The combination of a deformable 


model and an algorithm for fitting that 
model to image data. In one common 
embodiment, the model is a param- 
eterized 2D curve, e.g., a b-spline 
parameterized by its control points. 
Image data, which might be a gradient 
image or 2D points, induces forces 
on points on the snake that are trans- 
lated to forces on the control points 
or parameters. An iterative algorithm 
adjusts the control points according 
to these forces and recomputes the 
forces. Stopping criteria, step lengths, 
and other issues of optimization are all 
issues that must be dealt with in an 
effective snake. [TV98:5.4] 


SNR: See signal-to-noise ratio. 


Sobel edge detector: A method of edge 
detection based on Sobel kernels. The 
edge magnitude of image E is the 
square root of the sum of squares 
of the convolution of the image with 
horizontal and vertical Sobel kernels, 


given by E = (Kx * D? + (K, * IP. 
The figure shows (eft) an image and 
(right) the Sobel operator applied to it: 
[JKS95:5.2.2] 


O 


ba 


Sobel gradient operator: See Sobel 


kernel. 


Sobel kernel: A gradient estimation ker- 


nel used for edge detection. The hor- 
izontal kernel is the convolution of a 
smoothing filter, s = [1, 2, 1] in the 
horizontal direction and a gradient 
operator d = [—1, 0, 1] in the vertical 
direction. The kernel 


1 
Ky=s*d'= (0) 
1 


highlights horizontal edges. The verti- 
cal kernel K, is the transpose of K,. 
[JKS95:5.2.2] 


soft mathematical morphology: An 


extension of gray scale mathematical 
morphology in which the min and max 
operations are replaced by other rank 
operations e.g., replacing each pixel 
in an image by the 90th percentile 
value in a 5x5 window centered 
at the pixel. Weighted ranks may be 
computed. See also fuzzy morphology. 
[GAT98] 


soft morphology: See soft 


mathematical morphology. 


soft vertex: A point on a polyline whose 


connecting line segments are almost 
collinear. Soft vertices may arise from 
segmentation of a smooth curve into 
line segments. They are called “soft” 
because they may be removed if the 
segments of the polyline are replaced 
by curve segments. [JKS95:6.6] 


solid angle: A property of a 3D object: 


the amount of the unit sphere’s sur- 
face that the object’s projection onto 
the unit sphere occupies. The unit 
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sphere’s surface area is 471, so the max- 
imum value of a solid angle is 47 stera- 
dians: [FP03:4.1.2] 


SOLID ANGLE 


source: An emitter of energy that illu- 
minates the vision system’s sensors. 
[WP:Light_source#Light_sources] 


source geometry: See light source 
geometry. 


source image: The image on which an 
image processing or an image analysis 
operation is based: [PGB03] 


Source Image Target Image 


source placement: See light source 
placement. 


space carving: A method for creating a 
3D volumetric representation from 2D 
images. Starting from a voxel represen- 
tation in which a 3D cube is marked 
“occupied”, voxels are removed if they 
fail to provide photo-consistency in the 
set of 2D images in which they appear. 
The order in which the voxels are pro- 
cessed is a key aspect of space carv- 
ing, as it allows otherwise intractable 
visibility computations to be avoided. 
[KSOO] 


space curve: A curve that may fol- 
low a path in 3D space (e., it is 
not restricted to lying in a plane). 
[WP:Space_curve#Topology] 
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space-time cuboid: A block of time- 
varying image data concatenated to 
form a 3D (or higher) solid. The term 
“cuboid” refers to a generally small sub- 
set of the full dataset. For example, 
given a space-time interest point from 
a video sequence, one could construct 
a space-time cuboid by concatenating 
an N x N neighborhood of data from 
each of the T previous and succeeding 
frames, plus the current frame, to form 
an Nx N x QT +1) block of data. 
This data might then be analyzed to 
create a unique descriptor for similar 
space-time patterns: [BPSK11] 
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space-time descriptor: A descriptor of 
image behavior that incorporates both 
spatial and temporal elements. For 
example, it could be based on proper- 
ties of a space-time cuboid positioned 
at a space-time interest point, or 
the cumulative motion history image. 
These descriptors are used particu- 
larly for behavior analysis and behavior 
classification. [BGX09] 


space-time interest point: An interest 
point which is distinctive because of 
both the spatial and temporal proper- 
ties of the pixels in its neighborhood. 
These points can be used for action 
recognition or feature point tracking. 
[BGX09] 


space-variant processing: Distributing 
the image processing power unevenly 
(in the geometric sense) around the 
image, e.g., with a log-polar image or 
when the processing is concentrated 
at the fovea or at a region of interest 
[MvSG90] 


Space-variant sensor: A sensor in 
which the pixels are not uniformly 
sampling the projected image data. 
For example, a log-polar image sensor 


has rings of pixels of exponentially 
increasing size as one moves radi- 
ally from the central point: [WP: 
Space_Variant_Imaging#Foveated_ 
sensors] 


sparse coding: A method for describing 


some data by using a few instances of 
a large set of descriptors (or the fir- 
ing of a few of many neurons). For 
example, a set of Gabor filters of differ- 
ent sizes, scales and orientations could 
be the descriptor set. Then a partic- 
ular image patch could be described 
in terms of the nonzero coefficients 
of a few Gabor filters, selected so as 
to reconstruct the patch well. See also 
sparse representation. [Mur12:11.8] 


sparse data: 1) Data containing many 


zero entries. 

2) Data in which there are few exam- 
ples of a specific configuration, so that 
the estimation of probabilities related 
to this configuration becomes unreli- 
able. [Mur12:3.5.4] 


sparse graphical model: A graphical 
model in which each random variable 
depends on only a small number of 
other variables. [HTFO8:Ch. 17] 


sparse representation: Given a large 


vocabulary of XN possible image 
descriptors or image features, one can 
describe an object or image by a 
binary vector of length N that indi- 
cates which of the features apply to the 
object. Since only a few will typically 
apply, the description is sparse. The 
representation can be extended to 


include encoding the presence of 
relations. See also sparse coding. 
[ARO2] 


sparsity problem: A general machine 


learning problem where there is insuf- 
ficient data to accurately or completely 
estimate a model. This could be a 
statistical model where the probabili- 
ties are inaccurate for sparse events, 
or a graphical model where there are 
missing links between nodes. See also 
sparse data. [Mur12:3.5.4] 


spatial angle: The area on a unit sphere 
that is bounded by a cone with its apex 
in the center of the sphere: 


Measured in steradians. This is 
frequently used when analyzing 
luminance. [WP:Solid_angle] 


spatial averaging: The pixels in the out- 


put image are weighted averages of 
their neighboring pixels in the input 
image. Mean smoothing and Gaussian 
smoothing are examples of spatial aver- 
aging. [Jai89:7.4] 


spatial domain smoothing: An imple- 


mentation of smoothing in which each 
pixel is replaced by a value that is 
directly computed from other pixels 
in the image. In contrast, smooth- 
ing with a frequency domain filter 
first processes all pixels to create a 
linear transformation of the image, 
such as a Fourier transform and 
expresses the smoothing operation 
in terms of the transformed image. 
[Sch89:Ch. 4] 


spatial frequency: The rate of repeti- 


tion of intensities across an image. In 
a 2D image, the space to which “spa- 
tial” refers is the image’s X-Y plane. 
The figure has significant repetition at 
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a spatial frequency of b pixel”! in the 
horizontal direction: 


spatial occupancy: A form of object 
or scene representation in which a 
3D space is divided into a grid of 
voxels. Voxels containing a part of the 
object are marked as being occupied 
and other voxels are marked as free 
space. This representation is particu- 
larly useful for tasks where proper- 
ties of the object are less important 
than simply the presence and position 
of the object, as in robot navigation. 
[JKS95:15.3.2] 


spatial proximity: The distance 
using the fast Fourier transform (FFT). between two Siructires eS | Sp a 
[Hec87:7.7] (as contrasted with proximity in a 
feature or property space). [JKS95: 

spatial hashing: See spatial indexing. 3.1] 
[WP:Spatial_index] 


The 2D Fourier transform represents 
spatial frequency contributions in all 
directions, at all frequencies. A discrete 
approximation is efficiently computed 


spatial pyramid matching: A form of 


spatial indexing: 1) Conversion of a pyramid matching, in which the hier- 


shape to a number, so that it may be 
quickly compared to other shapes. Inti- 
mately linked with the computation 
of invariants to spatial transformations 
and imaging distortions of the shape. 
For example, a shape represented as 
a collection of 2D boundary points 
might be indexed by its compactness. 
2) The design of efficient data struc- 
tures for search and storage of geo- 
metric quantities. For example closest- 
point queries are made more effi- 
cient by the computation of spa- 
tial indices such as the Voronoi 
diagram, distance transform, k-D trees 
or binary space partitioning (BSP) 
trees. [WP:Spatial_index] 


spatial light modulator: A pro- 


grammable optical filter (often based 
on liquid crystal technology) used to 
control the amount of light that passes 
through the filter. Can be found in data 
projectors. Another application is in 
optical image processing, where the fil- 
ter is placed in the Fourier transform 
plane to allow frequency domain 
filtering. [WP:Spatial_light_modulator] 


spatial matched filter: See matched 


archical subdivision is spatial, e.g., by 
recursively subdividing by a factor of 
two in each direction. The advan- 
tage over pyramid matching is that it 
takes account of the spatial distribu- 
tion of the different features. At each 
level and spatial subdivision, a form 
of bag-of-words matching takes place, 
and then the matching results are 
combined hierarchically with weights 
related to the size of the spatial region. 
[LSP06] 


spatial quantization: The conversion 


of a signal defined on an infinite 
domain to a finite set of limited- 
precision samples. For example, the 
function f(x, y): R? œ> R might be 
quantized to the image g, of width 
w and height b defined as g, j): 
{1..w} x {1..b} œ> R. The value of a 
particular sample g(i, j) is determined 
by the point-spread function p(x, y), 
and is given by g@, j) = J pæ- i, y— 
DDSx, dxdy. [Umb98:2.2.4] 


spatial reasoning: Inference from geo- 


metric rather than symbolic or lin- 
guistic information. See also geometric 
reasoning. [Fra92] 


filter. 


spatial normal fields: Adjacent points 
on a surface have spatially adjacent 
surface normals, resulting in a field 
of surface normals. These fields might 
arise from shape from shading algo- 
rithms. [Bal11] 
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spatial relation: An association of two 


or more spatial entities, expressing the 
way in which such entities are con- 
nected or related. Examples include 
perpendicularity or parallelism of lines 
or planes, and inclusion of one image 
region in another. [BKKP05:5.8] 


spatial resolution: The smallest separa- 
tion between distinct signal features 
that can be measured by a sensor. 
For a CCD camera, this is dictated by 
the distance between adjacent pixel 
centers. It is often specified as the 
angle between the 3D rays correspond- 
ing to adjacent pixels. The inverse of 
the highest spatial frequency that a 
sensor can represent without aliasing. 
[JKS95:8.2] 


spatial statistics: The statistical analy- 
sis of patterns that occur in space. 
In the case of image analysis, the 
statistics could refer, for example, to 
the image properties or to the distri- 
bution of image features or objects. 
[Cre93] 


spatio-temporal analysis: The analysis 
of moving images by processing that 
operates on the 3D volume formed by 
the stack of 2D images in a sequence. 
Examples include kinetic occlusion, 
the epipolar plane image CEPI) and 
spatio-temporal autoregressive models 
(STAR). [WSK84] 


spatio-temporal relationship match: 
Given a set of feature points in 
spatio-temporal space, found e.g., by 
the FAST interest point detector, one 
can describe the spatial relation (e.g., 
“near”) and temporal relation (e.g., 
“pefore”) between a pair of points. 
These together form a rich descrip- 
tion of activities, particularly involving 
multiple agents. Matching instances 
of these complex descriptions allows 
behavior classification. [RAO9] 


spatio-temporal space: A representa- 
tion for a portion or all of a video 
sequence, usually 3D, in which the 2D 
video frames are stacked on top of each 
other to form a 3D volume: 


tm 
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This can be generalized to 4D, e.g., 
by combining 3D datasets (e.g., MRD 
captured over time. One can also treat 
the spatio-temporal space as continu- 
ous, e.g., when analyzing differential 
geometry. 


special case motion: A subproblem 


of the general structure from motion 
problem, where the camera motion 
is known to be constrained a pri- 
ori. Examples include planar motion 
estimation, turntable motion (single- 
axis rotation), and pure translation. In 
each case, the constrained motion sim- 
plifies the general problem, yielding 
one or more of: closed-form solutions, 
greater efficiency, increased accu- 
racy. Similar benefits can be obtained 
from approximations such as the 
affine camera and weak perspective. 
[Saw94a] 


specificity: A binary classifier c(x) 


returns + or — labels for an exam- 
ple x. Comparing these predictions to 
the actual label gives rise to a true 
positive (TP), true negative (TN), false 
positive (FP) or false negative (FN). 
The specificity is defined as the true 
negative rate, i.e., TN/(TN + FP), or 
the percentage of those labeled as neg- 
ative are true negatives. The term is 
mainly used in medical contexts. See 
also sensitivity. [HTF08:9.2] 


speckle: A pattern of light and dark spots 


Rough surface 


superimposed on the image of a scene 
that is illuminated by coherent light, 
such as from a laser. Rough surfaces 
in the scene change the path lengths 
and thus the interference effects of dif- 
ferent rays, so a fixed scene, laser and 
imager configuration results in a fixed 
speckle pattern on the imaging surface: 
[Jai89:8.13] 


Imaging surface 
Laser source (e.g. CCD array) 
Beam 
interference 
gives 
light/dark spot 


ML 


speckle reduction: Restoration of 


images corrupted with speckle noise, 
such as laser or ultrasound images. 
[Jai89:8.13] 
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SPECT: See single-photon emission 
computed tomography 


spectral analysis: 1) Analysis per- 
formed in the spatial, temporal or elec- 
tromagnetic frequency domains. 
2) Generally, any analysis that involves 
the examination of eigenvalues. This 
is a nebulous concept, and con- 
sequently the number of “spectral 
techniques” is large. Often equiva- 
lent to principal component analysis. 
[WP:Spectral_theory] 


spectral clustering: A form of graph 
theoretic clustering. The similarities 
between pairs of data points are 
recorded in a square similarity matrix. 
Spectral graph partitioning is then 
run to obtain a clustering. [HTFO08: 
14.5.3] 


spectral decomposition method: See 
spectral analysis. [WP:Eigendecompo- 
sition_of_a_matrix] 


spectral density function: See power 
spectrum. 


spectral distribution: The spatial 
power spectrum or electromagnetic 


spectral graph theory: The study of 
the properties of graphs revealed by 
an eigenanalysis of associated matri- 
ces, e.g., the adjacency matrix or graph 
Laplacian. [Chu97] 


spectral reflectance: See reflectance. 


spectral response: The response R 
of an imaging sensor illuminated by 
monochromatic light of wavelength A 
is the product of the input light inten- 
sity J and the spectral response at 
that wavelength sà), so R = Is). 
[OGFN05] 


spectrum: A range of values such 
as the electromagnetic spectrum. 
[WP:Spectrum] 


specular reflection: Mirror-like reflec- 
tion or highlight. Formed when a light 
source at 3D location L, surface point 
P, surface normal N at that point and 
camera center C are all coplanar, and 
the angles LPN and NPC are equal: 
[FP03:4.3.4-4.3.5] 


Light source Camera 


spectrum distribution. [JMW+64] 


spectral factorization: A method for 
designing linear filters based on dif- 
ference equations that have a given 
spectral density function when applied 
to white noise. [Jai89:6.3] 


spectral filtering: Modifying the light 
before it enters the sensor by using 
a filter tuned to different spectral fre- 
quencies. A common use is with laser 
sensing, in which the filter is chosen to 
pass only light at the laser’s frequency. 
Another usage is to eliminate ambi- 
ent infrared light in order to increase 
the sharpness of an image (as most 
silicon-based sensors are also sensitive 
to infrared light). [Buc87] 


spectral frequency: Electromagnetic 
spectrum or spatial frequency. 
[Hec87:7.7] 


spectral graph partitioning: A graph 
partitioning obtained using eigenanal- 
ysis of a matrix associated with the 
graph. Shi and Malik have used it for 
image segmentation. [SMOO] 
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specularity: See specular reflection. 


sphere: 1) A surface in any dimension 
defined by x such that ||x — ¢|| = r for 
a center ¢ and radius r. 
2) The volume of space bounded by 
the above, or X such that ||x — ¢|| < r. 
[WP:Sphere] 


spherical: Having the shape of, char- 
acteristics of or associations with a 
sphere. [WP:Sphere] 


spherical aberration: A form of optical 
distortion that arises from the use of 
spherical lenses, rather than aspherical 
lenses tuned to the index of refraction 
of the glass. The result of the aberration 
is that parallel incoming light rays do 
not focus at a point; instead rays hitting 
the lens at different distances from the 
optical axis focus at different distances: 
[FPO3:p. 11] 


sphericalharmonic: A function de- 
fined on the unit sphere, of the form 


Y"@, p) = NimP?” cose”, 


where 7, is a normalizing factor, and 
Pj” is a Legendre polynomial. Any real 
function defined on the sphere f@, ¢) 
has an expansion in terms of the spher- 
ical harmonics of the form 


foe) l 
SO D=} } ay", ) 


1=0 m=- 


That is analogous to the Fourier expan- 
sion of a function defined on the 
plane, with the aj” analogous to 
the Fourier coefficients. Polar plots 
of the first ten spherical harmon- 
ics, for m=0...2,1=0...m show 
r = 1 + Y”@, p) in polar coordinates: 
[BB82:9.2.3] 


© 
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spherical mirror: Sometimes used in 
catadioptric cameras. A mirror whose 
shape is a portion of a sphere. 
[WP:Spherical_mirror#Mirror_shape] 


spherical spin image: A generaliza- 
tion of the spin image to apply 
to freeform surfaces of 3D objects. 
It produces a local surface shape 
description that can be used for 


image database indexing and object 
recognition. [RSM01] 


sphericity ratio: A measure in [0, 1] of 
how close a volume is to a perfect 
sphere (1.0). If A and V are the surface 
area and volume of the shape respec- 
tively, then its sphericity is: 


ani @vyi 


[WP:Sphericity] 


spin image: A local surface represen- 
tation of Johnson and Hebert. At 
selected points p with surface normal 
n, all other surface points x can be 
represented on a 2D basis as (a, 6B) = 
G/Il%- B IP -G-@- DY, n- G- 
—p)). The spin image is the histogram 
of all the (a, 8) values for the sur- 
face. Each selected point p leads 
to a different spin image. Matching 
points compares their spin images by 
correlation. Key advantages of the rep- 
resentation are that it is independent 
of pose and it avoids ambiguities of 
representation that can occur with 
nearly flat surfaces. [FP03:21.4.2] 
[JH99] 


splash: An invariant representation of 
the region about a 3D point. It gives 
a local shape representation useful for 
position-invariant object recognition. 
[SM92] 


spline: 1) A curve c(@) defined as a 
weighted sum of control points: €@) = 
Xo Wi P;, where the control points 
are Pi .n and one weighting (or “blend- 
ing”) function w; is defined for each 
control point. The curve may inter- 
polate the control points or approx- 
imate them. The construction of the 
spline offers guarantees of continu- 
ity and smoothness. With uniform 
splines the weighting functions for 
each point are translated copies of 
each other, so w;(@) = wo(t — i). The 
form of wo determines the type of 
spline: for b-splines and Bezier curves, 
wo(t) is a polynomial (typically cubic) 
in t. Nonuniform splines reparame- 
terize the £ axis, C@) = CUC) where 
u(t) maps the integers k = 0..n to knot 
points tọ „n with linear interpolation 
for non-integer values of t. Rational 
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splines with 7-D control points are per- 
spective projections of normal splines 
with (n + 1)-D control points. 

2) Tensor-product splines define a 3D 
surface x(u, v) as a product of splines 
in wand v. [JKS95:6.7] 


spline smoothing: Smoothing of a dis- 


cretely sampled signal xŒ) by replac- 
ing the value at t; by the value pre- 
dicted at that point by a spline x@ fit- 
ted to neighboring values. [Jai89:8.7] 


split and merge: A two-stage proce- 


dure for segmentation or clustering. 
The data is divided into subsets, with 
the initial division being a single set 
containing all the data. In the split 
stage, subsets are repeatedly subdi- 
vided depending on the extent to 
which they fail to satisfy a coher- 
ence criterion (e.g., similarity of pixel 
colors). In the merge stage, pairs of 
adjacent sets are found that, when 
merged, again satisfy a coherence crite- 
rion. Even if the coherence criteria are 
the same for both stages, the merge 
stage may still find subsets to merge. 
[Nal93:3.3.2] 


SPOT: Systeme Probatotre de 


l'Observation de la Terre. A series of 
satellites launched by France that are 
a common source of satellite images 
of the earth. SPOT-5 was launched 
in May 2002 and provides complete 
coverage of the earth every 26 days. 
[WP:SPOT (satellite)] 


spot detection: An image- 


processing operation for finding 
small bright or dark locations against 
contrasting backgrounds. The issues 
are the size of spot and the amount of 
contrast. [RT71] 


spur: A short segment attached to a more 


significant line or edge: 


SPURS 
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Spurs often arise when linear struc- 
tures are tracked through noisy data, 
such as in edge detection. [SOSOO: 
5.2] 


squared error clustering: A class of 
clustering algorithms that attempt 
to find cluster centers ¢,...C, 
that minimize the squared error 
zex MIN.. mX — D? where ¥ is 
the set of points to be clustered. See 
also: k-means [Mir 99] 


stadimetry: The computation of dis- 
tance to an object of known size based 
on its apparent size in the camera’s 
field of view. [WP:Stadimeter] 


standard illuminant: A published stan- 
dard spectrum (usually internationally 
agreed by the CIE) for a given sit- 
uation, e.g., an incandescent lamp, 
which allows color researchers and 
manufacturers to use an agreed color 
spectral distribution for an illuminant. 
[WP:Standard_illuminant] 


state inference: Given a system that 
could be in more than one state (e.g., 
a person walking, running or stand- 
ing), state inference is the process 
of deciding which of the states the 
system is in. The algorithm could be 
rule-based, probabilistic or based on 
fuzzy reasoning etc. 


state space model: A state space model 
is a probabilistic model with a hidden 
state variable evolving dynamically in 
time. Observations are made accord- 
ing to an observation probability dis- 
tribution, giving partial information 
about the hidden state. Canonical 
examples of state space models are 
the hidden Markov model (HMM) 
with discrete state and the Kalman 
filter with continuous state. [Bis06: 
13.1] 


state transition probability: In a 
Markov process in discrete time, the 
state transition probability distribution 
specifies p(z41|Z) where z denotes 
the state at time t. [Bis06:13.2] 


state vector: In a Kalman filter, the 
hidden state variable is a continuous- 
valued vector variable known as the 
state vector. [BisO6:13.3] 


stationary camera: A camera whose 


optical center does not move. The 
camera may pan, tilt and rotate about 
its optical center, but it may not 
translate. Images taken by a station- 
ary camera are always related by a 
planar homography. Also known as 
a “rotating” or “non-translating” cam- 
era. The term may also refer to a 
camera that does not move at all. 
[JKKW06] 


statistical behavior model: A sta- 


tistical form of a behavior model, 
such as a hidden Markov model. 
An advantage of the statistical form 
is that one can keep an estimate 
of the probability of a hypothesized 
behavior. 


statistical classifier: A function map- 
ping from a space of input data 
to a set of labels. Input data 
are points xeR” and labels are 
scalars. The classifier c(x) = / assigns 
the label Z to point x. The clas- 
sifier is typically a parameterized 
function, such as a neural network 
(with weights as parameters) or a 
support vector machine. The classi- 
fier parameters could be set by opti- 
mizing performance on a training 
set of known (%,) pairs or by 
a self-organizing learning algorithm. 
[Jai89:9.14] 


statistical distribution: A description of 
the relative number of times each pos- 
sible outcome will occur for a given 
variable over a number of trials. For 
example, for a fair dice, each value will 
occur an equivalence of 2 times. See 
also probability distribution. 


statistical fusion: A general term for 
estimating a value A (and possibly its 
distribution) from a set of other val- 
ues {B;} and their distributions {D;}. 
A simple example is: given two esti- 
mates of a value Vı and V2, with asso- 
ciated probabilities p, and p2, then a 
fused estimate is (pı Vı + p2V2)/Ghi + 
p2). The Kalman filter is a more sophis- 
ticated algorithm for statistical fusion. 
[Gus10] 


statistical independence: The ran- 
dom variables X and Y are said 


statistical 


to be independent if their joint 
probability distribution p(X, Y) 
factorizes as pCX)pCY). See also 
conditional independence. [Bis06: 
p. 17] 


statistical insufficiency problem: The 


problem of having insufficient train- 
ing examples to be able to estimate 
a sound statistical model for some 
phenomenon. This occurs particularly 
in unusual behavior detection, where 
there are many examples of com- 
mon normal behaviors, but there may 
be only a few or no examples of 
the many different abnormal behav- 
iors. See also sparse data. [GX11: 
p. 252] 


statistical learning: The process of 


using data and a statistical model to 
make inferences about the distribution 
that generated the data. Supervised 
learning and unsupervised learning are 
the major categories. [Was04:Ch. 6] 


model: A statistical 
model for a dataset is a set of 
probability distributions. A parametric 
model has a finite number of parame- 
ters; a non-parametric model cannot 
be parameterized by a finite number 
of parameters. Examples of para- 
metric models include the Gaussian 
distribution and linear regression 
with a probabilistic error model. 
[Was04:Ch. 6] 


statistical moment: There are a num- 


ber of statistical moments that pro- 
vide information regarding the distri- 
bution and shape of statistical data (i.e., 
a probability distribution). Common 
ones are the infinite family f p(x” 
for positive integers n > 0 and a prob- 
ability distribution p(x). For n=1 
we obtain the mean of the distri- 
bution. Moments are a special case 
of expectation values. Another fam- 
ily are histogram moments. [Was04: 
p. 49] 


statistical pattern recognition: Pattern 


recognition that depends on clas- 
sification rules learned from exam- 
ples rather than constructed by 
designers. Compare structural pattern 
recognition. [Sch89:Ch. 6] 
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statistical shape model: A parameter- 


ized shape model where the parame- 
ters are assumed to be random vari- 
ables drawn from a known prob- 
ability distribution. The distribution 
is learned from training examples. 
Examples include point distribution 
models. See also active shape model. 
[WP:Point_distribution_model] 


statistical shape prior: Shape priors 
are useful for constraining the estima- 
tion of shapes from noisy or under- 
constraining image data, e.g., to stabi- 
lize the estimation or to ensure that 
the recovered shape comes from a 
given family of shapes. The statistical 
aspect adds a bias towards the recov- 
ered shape having higher probability 
under the prior distribution. Examples 
include the active shape model and the 
point distribution model. [Cre06] 


statistical texture: A texture whose 
description is in terms of the statis- 
tics of image neighborhoods. General 
examples are co-occurrence statistics 
of pairs of neighboring pixels, Fourier 
texture descriptors, autocorrelation 
and autoregressive models. A specific 
example is the statistics of the distri- 
bution of entries in 5 x 5 neighbor- 
hoods. These statistics may be learned 
from a set of training images or auto- 
matically discovered via clustering. 
[Nev82:8.3.1] 


steerable filter: A filter applied to a 2D 


image, whose response is dependent 
on a scalar “orientation” parameter 0, 
but for which the response at any arbi- 
trary value of 0 may be computed as 
a function of a small number of basis 
responses, thus saving computation. 
For example, the directional derivative 
at orientation 0 may be computed in 
terms of the x and y derivatives Iy and 
I, as 


dI (cos6l, 
dim  \sinéel, 


For non-steerable filters, such as the 
Gabor filter, the response must be 
computed at each orientation, leading 
to higher computational complexity. 
[SF96] 
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steganography: Concealing of infor- 


mation in non-suspect “carrier” data. 
For example, encoding information in 
the low-order bits of a digital image. 
[WP:Steganography] 


step edge: 1) A discontinuity in image 


intensity (compare with fold edge). 

2) An idealized model of a step-change 
in intensity. The figure shows a plot of 
intensity I against X position with a 
step edge discontinuity in intensity 7 
at a: [JKS95:Ch. 5] 


steradian: The unit of solid angles. 


[FP03:4.1.2] 


stereo: General term for a class of prob- 


lems in which multiple images of the 
same scene are used to recover a 
3D property such as surface shape, 
orientation or curvature. In binocular 
stereo, two images are taken from dif- 
ferent viewpoints allowing the com- 
putation of 3D structure. In trifo- 
cal, trinocular stereo and multi-view 
stereo, three or more images are avail- 
able. In photometric stereo, the view- 
point is the same, but lighting condi- 
tions are varied in order to compute 
surface orientation. [WP:Stereoscopy] 


stereo camera calibration: The com- 


putation of intrinsic parameters and 
extrinsic parameters for a pair of cam- 
eras. Important extrinsic variables are 
relative orientation: the rotation and 
translation relating the two cameras. 
Calibration can be achieved in several 
ways: conventional camera calibration 
of each camera independently; com- 
putation of the essential matrix or 
fundamental matrix relating the pair, 
from which relative orientation may 
be computed along with one or two 
intrinsic parameters; for a rigid stereo 
rig, moving the rig and capturing mul- 
tiple image pairs. [TV98:7.1.3] 


stereo convergence: The angle a 
between the optical axes of two sen- 
sors in a stereo configuration: [Stu99] 


KoA 

stereo correspondence problem: The 
key to recovering depth from stereo 
is to identify 2D image points that 
are projections of the same 3D scene 
point. Pairs of such image points are 
called “correspondences”. The corre- 
spondence problem is to determine 
which pairs of image points are cor- 
respondences. Unfortunately, match- 
ing features or image neighborhoods 
is usually ambiguous, leading to mas- 
sive amounts of computation and 
many alternative solutions. To reduce 
the space of matches, correspond- 
ing points are usually required to sat- 
isfy some constraints, such as having 
similar orientation and contrast, local 
smoothness or uniqueness of match. 
A powerful constraint is the epipo- 
lar constraint: from a single view, an 
image point is constrained to lie on a 
3D ray, whose projection onto the sec- 
ond image is an epipolar curve. For pin- 
hole cameras, the curve is an epipolar 
line. This greatly reduces the space of 
potential matches. [JKS95:11.2] 


stereo fusion: The ability of the human 
vision system, when presented with 
a pair of stereo images, one to 
each eye independently, to form a 
consistent 3D interpretation of the 
scene, essentially solving the stereo 
correspondence problem. The fact 
that humans can perform fusion even 
on random dot stereograms means that 
high-level recognition is not required 
to solve all stereo correspondence 
problems. [BB82:3.4.2] 


stereo image rectification: For a pair 
of images taken by pinhole cameras, 
points in stereo correspondence lie on 
corresponding epipolar lines. Stereo 


image rectification resamples the 2D 
images to create two new images, 
with the same number of rows, so 
that points on corresponding epipolar 
lines lie on corresponding rows. This 
reduces computation for some stereo 
algorithms, although certain relative 
orientations (e.g., translation along the 
optical axis) make rectification difficult 
to achieve. [JKS95:12.5] 


stereo matching: See 
correspondence problem. 


stereo 


stereo triangulation: Determining the 
3D position of a point given its 2D 
positions in each of two images taken 
by cameras in known positions. In the 
noise-free case, each 2D point defines 
a 3D ray by back projection and the 
3D point is at the intersection of the 
two rays. With noisy data, the optimal 
triangulation is computed by finding 
the 3D point that maximizes the prob- 
ability that the two imaged points are 
noisy projections thereof. Also used 
for the analogous problem in multiple 
views. [WP:Range_imaging#Stereo_ 
triangulation] 


stereo vision: The ability to determine 
three-dimensional structure using two 
eyes. See also stereo. [TV98:7.1] 


stimulus: 1) Any object or event that a 
computer vision system may detect. 
2) The perceived radiant energy itself. 
[WP:Stimulus] 


stochastic gradient descent: An opti- 
mization algorithm for minimizing a 
convex cost function. [WP:Stochastic_ 
gradient_descent] 


stochastic completion field: A strategy 
for algorithmic discovery of illusory 
contours. [WJ95] 


stochastic process: A family of random 
variables X(H, where ¢ runs over an 
index set. This set could be taken as 
the real line (for a continuous-time pro- 
cess), over the integers (for a discrete- 
time process) or could index space, 
e.g., R? (fora random field). [GS92:8.1] 


stratification: A class of solutions to 
self-calibration in which a projective 
reconstruction is first converted to an 
affine reconstruction (by computing 
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the plane at infinity) and then to a 
Euclidean reconstruction. [HZ00:18.5] 


streaming video: Video presented as 
a sequence of images or frames. An 
algorithm processing such video can- 
not easily select a particular frame. 
[WP:Streaming_ media] 


stripe ranging: See structured light 
triangulation. 


strobe duration: The time for which 
a strobed light is illuminated. 
[Gal90:2.1.1] 


strobed light: A light that is illuminated 
for a very short period, generally at 
high intensity. [Gal90:2.1.1] 


strong learner: A learner that gives an 
error that is (with high probability) 
arbitrarily close to zero. Contrast with 
weak learner. Boosting is a method to 
combine many weak learners to pro- 
duce a strong learner. [FS97] 


structural description: A represen- 
tation that contains explicit infor- 
mation about object parts and the 
relationships between them. See 
also part-based representation and 
geometric model. [Pal99:8.2.4] 


structural pattern recognition: 
Pattern recognition where classifica- 
tion is achieved using high-level 
rules or patterns, often speci- 
fied by a human designer. See 
also syntactic pattern recognition. 
[WP:Syntactic_pattern_recognition] 


structural texture: A texture that is 
formed by the regular repetition of a 
primitive structure, e.g., an image of 
bricks or windows. [Jai89:9.11] 


structure and motion recovery: The 
simultaneous computation of 3D scene 
structure and 3D camera positions 
from a sequence of images of a 
scene. Common strategies depend on 
tracking of 2D image entities (e.g., 
interest points or edges) through mul- 
tiple views and thus obtaining con- 
straints on the 3D entities (e.g., points 
and lines) and camera motion. Con- 
straints are embodied in entities such 
as the fundamental matrix and trifocal 
tensor, which may be estimated from 
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image data alone and then allow com- 
putation of the 3D camera positions. 
Recovery is up to certain equivalence 
classes of scenes, where any member 
of the class may generate the observed 
data, such as projective reconstruction 
or affine reconstruction. [MM95] 


structure factorization: See motion 
factorization. 


structure from motion: Recovery of 
the 3D shape of a set of scene points 
from their motion. For a more modern 
treatment, see structure and motion 
recovery. [JKS95:14.7] 


structure from optical flow: Recov- 
ery of camera motion by com- 
puting optical flow constrained by 
the infinitesimal motion fundamental 
matrix. The small motion approxima- 
tion replaces the rotation matrix R by 
I— [©]; where @ is the axis of rotation, 
the unique vector such that Rö = ©. 
[Adi85] 


structure learning: 1) 
geometric models. 
2) Learning the nodes and arcs of 
graphical models, as in a Bayesian 
network, a Hidden Markov model or 
a more general graph, e.g., a directed 
acyclic graph. [KF09:Ch. 18] 


Learning 


structure matching: See recognition by 
components. 


structure tensor: Typically a 2D or 3D 
matrix that characterizes the dominant 
gradient directions at a point by a 
weighted combination of the gradient 
values within a given window. Larger 
window sizes increase stability but 
reduce sensitivity to small structures. 
The eigenvalues of the structure tensor 
encode the extent to which there are 
different gradient directions within the 
window. The structure tensor is impor- 
tant to interest point feature detectors, 
the Harris corner detector and also for 
characterizing texture and other appli- 
cations. [WP:Structure_tensor] 


structured light: A class of techniques 
where carefully engineered illumina- 
tion is employed to simplify com- 
putation of scene properties. Com- 
mon examples include structured light 


triangulation and moiré fringe sensing. 
[JKS95:11.4.1] 


structured light source calibration: 
The special case of calibration in a 
structured light system where the posi- 
tion of the light source is determined. 
[BMS98] 


structured light triangulation: Recov- 
ery of 3D structure by computing the 
intersection of a ray (or plane or other 
light shape) of light with the ray deter- 
mined by the image of the illuminated 
scene surface: [JKS95:11.4.1] 
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structured model: See hierarchical 
model. 


structured SVM: The basic support 
vector machine (SVM) is a binary 
classifier. It can be generalized to have 
a structured output space, e.g., a set 
of interdependent labels in a chain or 
on a grid. See also conditional random 
field for a similar goal using probabilis- 
tic modeling. [Mur12:1.2.14, 17.5] 


structuring element: The basic neigh- 
borhood structure of morphological 
image processing. The structuring ele- 
ment is an image, typically small, that 
defines a shape pattern. Morpholog- 
ical operations on a source image 
combine the structuring element with 
the source image in various ways. 
[JKS95:2.6] 


subband coding: A means of cod- 
ing a discrete signal for transmission. 


The signal is passed through a set 
of bandpass filters and each channel 
is quantized separately. The sampling 
rate of the individual channels is set 
such that, before quantization, the sum 
of the number of per-channel samples 
is the same as the number of samples 
of the original system. By varying the 
quantization for different bands, the 
number of samples may be reduced 
with small losses in signal quality. 
[WP:Subband_coding] 


subcategory recognition: Once an 
object category, such as “animal”, has 
been recognized then subcategories 
such as “horse” or “dog” can be recog- 
nized. The hierarchy can be extended, 
e.g., from “dog” to “terrier” and then 
to “Skye terrier”. [TA08] 


subcomponent: An object part used in 
a hierarchical model. [Fis87] 


subcomponent decomposition: Rep- 
resentation of a complete object part 
by a collection of smaller objects in a 
hierarchical model. [PR88] 


subgraph isomorphism: Equivalence 
of a pair of subgraphs of two given 
graphs. Given graphs A and B, the 
subgraph isomorphism problem is to 
enumerate all pairs of subgraphs (a, b) 
where: a C A; b C B; a is isomorphic 
to b; and some given predicate p(a, b) 
is true. Appropriate modifications of 
the problem allow the solution of many 
graph problems including determin- 
ing shortest paths and finding maximal 
cliques. A given graph has a number of 
subgraphs exponential in the number 
of vertices and the general problem is 
NP-complete. The figure shows a sub- 
graph isomorphism with the match- 
ing graph being A:b-C:a-B:c: [JKS95: 
15.6.3] 


subjective contour: An edge perceived 
by humans in an image because of 
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Gestalt completion, particularly when 
no image evidence is present. In the 
figure (Kanizsa’s triangle), the triangle 
that appears to float above the black 
discs is bounded partially by a subjec- 
tive contour: [Nev82:7.4] 


subpixel edge detection: Estimation 
of the location of an image edge by 
subpixel interpolation of the gradient 
Operator response, to give a position 
more accurately than an integer pixel 
value. [JKS95:5.7] 


subpixel interpolation: A class of tech- 
niques that essentially interpolate the 
position of local maxima in images 
to positions at a resolution smaller 
than integer pixel coordinates. Exam- 
ples include subpixel edge detection 
and interest point detection. A rule 
of thumb is that 0.1 pixel accuracy 
is often possible. If the input is an 
image z(x, y) containing the response 

of some kernel to a source image, a 

typical approach might be as follows: 

(a) Identify a local maximum where 
Ax, y) > Xa, b) where (a,b) € 
neighborhood x, y). 

(b) Fit the quadratic surface z = ai? + 
bij+cj?+dit+ej+ f to the set 
of samples (i, j, zx + i, y + j)) in 
a neighborhood about (x, y). 

(c) Compute the position of the local 
maximum of the quadratic sur- 
face: 


()--(% ay (A 


(d) If —4 < {i,j} < $ then report a 


maximum at subpixel location 
wtiyt/. 
Similar strategies apply when comput- 
ing the subpixel location of edges. 


[JKS95:5.7] 
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subsampling: Reducing the size of 
an image by producing a new 
image whose pixel values are more 
widely sampling the original image 
(e.g., every third pixel). Interpola- 
tion can produce more accurate sam- 
ples. To avoid aliasing, any spatial 
frequency higher than the Nyquist 
frequency of the coarse grid should 
be removed by low-pass filtering the 
image. Also known as “downsam- 
pling”. [SOS00:3.6] 


subspace: A subset of a vector space that 
is closed under addition and scalar mul- 
tiplication. In data analysis, subspace 
structure may be detected, e.g., by 
principal component analysis. [Nob69: 
14.2] 


subspace analysis: The description of 
a dataset in terms of one of more 
subspaces. A probabilistic formulation 
is a mixture model of probabilistic 
principal component analysis compo- 
nents. 


subspace learning: A subspace method 
where the subspace is learned from a 
number of observations. [DLTB03] 


subspace method: A general term 
describing methods that convert a vec- 
tor space into a lower-dimensional sub- 
space, e.g., projecting a set of N- 
dimensional vectors onto their first 
two principal components to pro- 
duce a 2D subspace. See principal 
component basis space. [Ho98] 


subsurface scattering: When light is 
reflected not only from the surface 
of an object, but also partially from 
the interior of the surface through a 
sequence of reflections before exit- 
ing the surface. This is an important 
factor in the appearance of surfaces 
such as human skin. [WP:Subsurface_ 
scattering] 


subtractive color: The way in which 
color appears because of the attenu- 
ation or absorption of frequencies of 
light by materials (e.g., we perceive 
that something is red it is because it 
is attenuating or absorbing all wave- 
lengths other than those correspond- 
ing to red). See also additive color. 
[WP:Subtractive_color] 


superellipse: A class of 2D curves, 
including ellipses and Lamé curves as 
special cases. The general form of the 
superellipse is 


although several alternative forms 
exist. Fitting superellipses to data is 
difficult because of the strongly non- 
linear dependence of the shape on the 
parameters a and $. The figure shows 
a convex superellipse with a = B = 3 
and a concave one with a = p= E: 
[WP:Super_ellipse] 


supergrid: A representation that is larger 
than the original image and repre- 
sents explicitly both the image points 
and the crack edges between them: 
[JKS95:3.3.4] 
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superparamagnetic clustering: A 
method for data clustering that takes 
account of the density of nearby 
points, rather than simply the dis- 
tance to nearby points, as in other 
clustering algorithms. The algorithm 
is influenced by the Potts spin model 
of magnetic behavior, in which data 
points are assigned a spin that influ- 
ences and is influenced by neighboring 
points and their spins. [BSD96] 


superpixel: 1) A segmented group of 
pixels that are similar. 

2) A pixel in a high-resolution image. 
An anti-aliasing computer graphics 
technique produces lower resolution 


supervised 


image data by a weighted sum of the 
superpixels. [FVSO9] 


superquadric: A 3D generalization of 


the superellipse, the solution set of 


OO a0) = 

a b c 

As with superellipses, fitting to 3D data 
is non-trivial, although some success 
has been achieved. The figure shows 
superquadrics, both with y = 2; the 
convex superquadric has a = Bp = 3 
and the concave one has a = 6B = 3: 
[SQ04:9.11] 


superresolution: Generation of a high- 


resolution image from a collection 
of low-resolution images of the same 
object taken from different viewpoints. 
The key to successful superresolu- 
tion is in the accurate estimation of 
the registration between viewpoints. 
[WP:Super_resolution] 


classification: See 
classification. 


supervised learning: The task in super- 


vised learning is to predict a response 
variable or output y given an input x 
based on a set of training examples. 
The output may be a class label, (a 
classification problem), a continuous- 
valued variable (a regression problem), 
or a more general object. Compare 
with unsupervised learning. [Bis06: 
Ch. 1] 


support vector machine: A statistical 


classifier assigning labels Z to points x 
in R”. The support vector machine has 
two defining characteristics. Firstly, 
the classifier places the decision sur- 
face that separates points x; and Xy, 
which have different labels /; Æ l}, in 
such a way as to maximize the mar- 
gin between them. Roughly speaking, 
the decision surface is as far as possi- 
ble from any x. Secondly, the classifier 
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operates not on the raw feature vec- 
tors x, but on high dimensional pro- 
jections [Œ : R” > RY, N > n. How- 
ever, because the classifier only ever 
requires dot products such as KOL 
f O), we never form f explicitly, but 
specify instead the kernel function 
K&, Y= KOL JO. Wherever the 
dot product between high-dimensional 
vectors is required, the kernel function 
is used instead. [SQ04:14A.2] 


support vector ranking: A modifica- 
tion of the support vector machine 
(SVM) to tackle the ranking problem. 


[SC04:8.1] 


support vector regression: A range 


of techniques for function estimation 
that uses a subset of the data to 
determine a function to model the 
data. See also support vector machine 
and hinge loss function. [WP:Support_ 
vector_machine#Regression] 


SURF: Speeded up robust features: An 


image feature detector and descrip- 
tor. The detector achieves improved 
speed by replacing the scale-space 
Gaussian smoothing by +1 weighted 
masks based on Haar wavelets (see 
Haar transform). The descriptor associ- 
ated with the detected points is based 
on a 4x 4 set of square subregions, 
each of which is described by four val- 
ues, again computed using the Haar 
wavelets. The result is an operator 
that is similar to, but is several times 
faster and more stable than, the SIFT 
operator. The figure shows some SURF 
points overlaying the original image: 
[BETO08] 


INPUT SURF POINTS 


Mathematically, it is a 2D subset of R? 
that is almost everywhere locally topo- 
logically equivalent to the open unit 
ball in R?. This means that a cloud of 
points is not a surface, but the sur- 
face may have cusps or boundaries. 
A parameterization of the surface is a 
function from R? to R? that defines 
the 3D surface point x(u, v) as a func- 
tion of 2D parameters (u, v). Restrict- 
ing (u, v) to subsets of R? yields a sub- 
set of the surface. The surface is the set 
S of points on it, defined over a domain 
D [WP:Surface]: 


S = {X(u, v)u, v) € D C R?} 


surface area: Given a parametric 


surface S = {x(u, v)|(u, v) € D C R?}, 
with unit tangent vectors x,,(u, v) and 
X (u, v), the area of the surface is 
[TV98:A.5] 


i, (Xau, V) X Xy(u, v|dudv 
Ss 


surface boundary representation: A 


method of defining surface models in 
computer graphics. It defines a 3D 
object as a collection of surfaces with 
boundaries. The model topology states 
which surfaces are connected and 
which boundaries are shared between 
patches. The figure shows three faces: 


AB-rep model of this image comprises: 
faces A, B, C along with the param- 
eters of their 3D surfaces; edges 1-9 
with 3D curve descriptions; vertices a- 
g; and the connectivities of these enti- 
ties, e.g., face B is bounded by curves 
1-4 and curve 1 is bounded by vertices 
b and c. [JKS95:15.3.2] 


surface class: Koenderink’s classifi- 
cation of local surface shape into 
classes based on two functions of the 


surface: In general parlance, a 2D shape 
that is located in three dimensions. 
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principal curvatures: the shape index 
S = —2 tan“! “£m and the curved- 
m KM—Km 


ness C = | į (kf + K2) where km > Km 
are the principal curvatures. The sur- 
face classes are planar (C = 0), hyper- 
boloid (|S| < 2) or ellipsoid (S| > 2) 
and cylinder Ë < |S| < 2), subdivided 
into concave (S < 0) and convex (S > 
0). Alternative classification systems 
exist based on the mean curvature and 
Gaussian curvature, or the principal 
curvature. The former distinguishes 
more classes of hyperboloid surfaces. 
[KvD79] 


surface completion: The process of 
filling in the holes in a scanned sur- 
face, which may have arisen because 
of defects in the sensing process or the 
inability of the sensor to see the given 
portion of the surface, e.g., because of 
occlusion. Algorithms that fill in the 
holes typically exploit local continu- 
ities around the hole and use image 
and shape data from elsewhere in the 
image or scene. [BF05] 


surface continuity: Mathematically 
defined at a single point param- 
eterized by (u,v) on the surface 
{x(u, v)|(u, v) € D C R?}. The surface 
is continuous at that point if infinites- 
imal motions in any direction away 
from (u, v) can never cause a sudden 
change in the value of x. The surface 
is “everywhere continuous” (or just 
“continuous”) if it is continuous at all 
points in D. [MLO5] 


surface curvature: A measure of the 


shape of a 3D surface (the character- 
istics of the surface that are constant 
if the surface is rotated or translated 
in 3D space). The shape is specified 
by the surface’s principal curvatures 
at each point. To compute the prin- 
cipal curvatures, we need the first 
and second fundamental forms. In the 
differential geometry of surfaces, the 
first fundamental form encapsulates 
the arc length of curves in a surface. 
If the surface is defined in paramet- 
ric form by a smooth function x(u, v), 
the surface’s tangent vectors at (u, v) 
are given by the partial derivatives 
X„(u, v) and X,(u, v). From these, we 


define the dot products E(u, v) = %, - 
Xu, F(u, vV) = Xu Xy, GU, V) = Xy + Xy. 
Then arc length along a curve in the 
surface is given by the first funda- 
mental form ds? = Edu? + 2Fdudv + 
Gdv*. The matrix of the first funda- 
mental form is the 2 x 2 matrix 


The second fundamental form encap- 
sulates the curvature information. The 
second partial derivatives are X,,,(u, V) 
etc. The surface normal at (u, v) is the 
unit vector (uw, v) along X,, x X,. Then 
the matrix of the second fundamental 
form at (u, v) is the 2 x 2 matrix 


= Xun N Xu N 
Xou MN Xw Nn) 


If d= (du, dv)' is a direction in the 
tangent space (so its 3D direction is 
KD = duž, + dvx,), then_the normal 
curvature in the direction d is given by 


a T 1d seat : 
k(d) = 47. The minima and maxima 
d'Id 


of k as d varies at a point (u, v) are the 
principal curvatures at the point, given 
by the generalized eigenvalues of Iz = 
KIZ, i.e., the solutions to the quadratic 
equation in « given by detdI — «I = 
0. [FP03:19.1.2] 


surface description: A description of 


a surface, which can include color, 
reflectance or shape texture, local 
differential geometry or global proper- 
ties, such as its elongatedness or shape 
class (e.g., spherical). 


surface discontinuity: A point at which 


the surface, or its normal vector, is not 
continuous. These are often fold edges, 
where the surface normal has a large 
change in direction. See also surface 
continuity. [SS92] 


surface fitting: A family of parametric 


surfaces Xp(u, v) is parameterized by a 
vector of parameters 0. For example, 
the family of 3D spheres is parameter- 
ized by four parameters: three for the 
center and one for the radius. Given a 
set of n sampled data points {p,, .., Pn}, 
the task of surface fitting is to find the 
parameters 0 of the surface that best 
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fits the given data. Common interpre- 
tations of “best fit” include finding the 
surface for which the sum of Euclidean 
distances from the points to the surface 
is smallest or that maximize the prob- 
ability that the data points could be 
noisy samples from the surface. Gen- 
eral techniques include least squares 
fitting or nonlinear optimization over 
the surface parameters. [JKS95:13.7] 


surface interpolation: Generating a 
continuous surface from sparse data 
such as 3D points. For example, given 
a set of n sampled data points $ = 
{P1, .-, Pn}, one might wish to generate 
other points in R? that lie on a smooth 
surface that passes through all the 
points in S$. Techniques include radial 
basis functions, splines and natural 
neighbor interpolation. [JKS95:13.6] 


surface light field: A function, defined 
at every point x on a surface, that 
assigns an RGB color (x, d) to every 
ray d exiting from that point. The 
concept is mainly used for computer 
graphics as it allows more realistic 
rendering of surfaces with specularity. 
[WAA+00] 


surface matching: Identifying corre- 
sponding points on two 3D sur- 
faces, often as a precursor to surface 
registration. [ZH99] 


surface mesh: A surface boundary 
representation in which the faces are 
typically planar and the edges are 
straight lines. Such representations are 
often associated with efficient data 
structures (e.g., winged edge and quad 
edge) that allow fast computation 
of various geometric and topological 
properties. Hardware acceleration of 
polygon rendering is a feature of many 
computers. [JKS95:13.5.1] 


surface model: A geometric model of 
a surface, often for the purpose of 
rendering or object recognition. There 
are many types of surface model, 
e.g., triangulated models or freeform 
surfaces. [BB82:Sec. 9.2] 


surface normal: The direction perpen- 
dicular to a surface. For a paramet- 


ric surface x(u, v), the normal is the 
unit vector parallel to 2 x a For an 
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implicit surface F(x) = 0, the normal 
is the unit vector parallel to VF = 
E, m, oF). The figure shows the sur- 
face normal as defined by the small 
neighborhood at point X: [TV98:A.5] 


surface orientation: The convention 
that decides whether the surface 
normal or its negation points outside 
the space bounded by the surface. 
[JKS95:9.2] 


surface patch: A surface whose domain 
is finite. [JKS95:13.5.2] 


surface reconstruction: The process of 
constructing a 3D surface model from 
a set of related data, such as a set of 
3D points, a set of cross-sections from 
scanner data (e.g., MRD or a set of filled 
voxels. [HDD+92] 


surface reflectance: A description of 
the manner in which a surface 
interacts with light. See reflectance. 
[JKS95:9.1.2] 


surface roughness: The measure of the 
shape texture of a surface, i.e., the 
variation of the surface away from an 
aligned ideal smooth surface. The tex- 
ture affects the appearance of the sur- 
face because of the way in which light 
is reflected, by the various possibili- 
ties for the reflected light. Image analy- 
sis applications might attempt surface 
roughness characterization as part of 
a manufacturing inspection process. 
[WP:Surface_roughness] 


surface roughness characterization: 
An inspection application where esti- 
mates of the roughness of a surface 
are made, e.g., when inspecting spray- 
painted surfaces. [Mye62] 


surface segmentation: Division of a 
surface into simpler patches. Given a 
surface defined over a domain D, deter- 
mine a partition D = {Din} on which 
some goodness criteria are well satis- 
fied. For example, it might be required 


that the maximal distance of a point 
of each D; from the best-fit quadric 
surface is below a threshold. See also 
range data segmentation. [SQ04:8.6] 


surface shape classification: The use 
of curvature information of a surface 
to classify each point on the surface 
as locally ellipsoidal, hyperbolic, cylin- 
drical or planar. See also surface class. 
For example, given a parametric sur- 
face x(u, v), the classification function 
c(u, v) is a mapping from the domain 
of (u, v) to a set of discrete class labels. 
[FJ88] 


surface shape discontinuity: A discon- 
tinuity in the value of a surface shape 
classification over a surface, e.g. a dis- 
continuity in the classification function 
c(u, v). The figure shows a discontinu- 
ity at the fold edge at point X: [Wit81] 


surface tracking: Identification of the 
same surface through the frames of a 
video sequence. [GU89] 


surface triangulation: See 
mesh. 


surface 


surflet: An oriented surface point (P, m 
consisting of a surface point p plus 
its surface normal 7. A pair of sur- 
flets gives a local descriptor that 
is translation invariant and rotation 
invariant; an aggregation of these into 
a histogram creates a compact descrip- 
tor of a 3D object. [WHH03] 


surveillance: An application area of 
vision concerned with the monitor- 
ing of activities in a scene. Typically 
this will involve at least background 
modeling and human motion analysis. 
[WP:Surveillance] 


SUSAN corner finder: A popular inter- 
est point detector developed by Smith 
and Brady. Combines the smooth- 
ing and central difference stages of 
a derivative-based operator into a 


single center-surround comparison. 
[WP:Corner_detection#The_SUSAN_ 
corner_detector] 


SVD: See singular value decomposition. 


SVM: See support vector machine. 


Swendsen—Wang Algorithm: An algo- 
rithm for random simulation by Markov 
chain Monte Carlo methods in a dis- 
crete Markov random field (MRP). 
[SW87] 


swept object representation: A 
volumetric representation scheme 
in which 3D objects are formed by 
sweeping a 2D cross section along 
an axis or trajectory. A brick can 
be formed by sweeping a rectangle. 
Some schemes, such as the geon or 
generalized cylinder representation, 
allow changes to the size of the 
cross section and curved trajectories. 
The figure shows a cone defined by 
sweeping a circle along a straight 
axis with a linearly decreasing radius: 
[JKS95:15.3.2] 


symbolic: Inference or computation 
expressed in terms of a set of symbols 
rather than a signal. Where a digital sig- 
nal is a discrete representation of a con- 
tinuous function, symbols are inher- 
ently discrete. For example, an image 
(signal) is converted to a list of the 
names of people who appear in it (sym- 
bols). [WP:Symbolic_computation] 


symbolic object representation: Rep- 
resentation of an object by lists of sym- 
bolic terms such as “plane”, “quadric”, 
“corner”, or “face” etc., rather than the 
points or pixels of the shape itself. 
The representation may include the 
shape and position of the objects. 


[JKS95:15.6.3] 


symmetric axis transform (SAT): A 
transformation that locates all points 
on the skeleton of a region by identi- 
fying those points that are the locus 
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of centers of bitangent circles. See also 
medial axis skeletonization. The figure 
shows the medial axis derived from a 
binary segmentation of a moving sub- 
ject: [Nal93:9.2.2] 


symmetry: A shape that remains 
invariant under at least one non- 
identity transformation from some pre- 
specified transformation group. For 
example, the set of points comprising 
an ellipse is the same after the ellipse 
is subjected to the Euclidean trans- 
formation of rotation by 180° about 
its center. The image of the outline 
of a surface of revolution under per- 
spective projection is invariant under a 
certain homography, so the silhouette 
exhibits a projective symmetry. Affine 
symmetry is sometimes known as skew 
symmetry and symmetry induced by 
reflection about a line is called “bilat- 
eral symmetry”. [SQ04:9.3] 


symmetry detection: A class of algo- 
rithms that search for symmetry in 
imaged curves, surfaces and point sets. 
[SS97a] 


symmetry group: For a given object, 
the group of all isometries (transfor- 
mations) that leave the object identical 
to the original. For example, a square 
with its center of mass at the origin has 
both rotational symmetry (four isome- 
tries) and reflection (five isometries) 
that mean the symmetry group con- 
tains 4 - 5 = 20 members. [Wey52] 


symmetry line: The axis of a bilateral 
symmetry. In the figure, the solid line 
rectangle has two dashed lines of sym- 
metry: [WP:Reflection symmetry] 
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symmetry plane: The axis of a bilat- 
eral symmetry in 3D. In the figure, 
the dashed lines show three symmetry 
planes of a cube: [HMS98] 


sync pulse: Abbreviation of “synchro- 
nization pulse”. Any electrical sig- 
nal that allows two or more elec- 
tronic devices to share a common time 
frame. Commonly used to synchro- 
nize the capture instants of two cam- 
eras in a stereo image capture system. 
[Gal90:4.1.1] 


syntactic pattern recognition: Object 
identification by converting an image 
of the object into a sequence or array 
of symbols and using grammar-parsing 
techniques to match the sequence 
of symbols to grammar rules in a 
database. [Sch89:Ch. 6] 


syntactic texture description: Descrip- 
tion of texture in terms of gram- 
mars of local shapes or image patches 
and transformation rules. Good for 
modeling synthetic artificial textures. 
[SHB08: 14.2] 


synthetic aperture radar (SAR): An 
imaging device that transmits long- 
wavelength (in comparison to visible 


light) radio waves from airborne or 
space platforms and builds a 2D image 
of the intensities of the returned reflec- 
tions. Clouds are transparent at these 
(centimeter) wavelengths and the 
active transmission means that images 
may be taken at night. The images 
are captured as a sequence of low- 
resolution (“small aperture”) 1D slices 
as the platform translates across the tar- 
get area, with a final high-resolution 
(“synthetic aperture”) image recov- 
erable via a Fourier transform after 
all slices have been captured. The 
time-of-flight of the returned signal 


determines the distance from the trans- 
mitter and therefore, assuming a pla- 
nar (or known geometry) surface, the 
pixel location in the cross-path direc- 
tion. [WP:Synthetic_aperture_radar] 


systolic array: A class of parallel com- 


puter in which processors are arranged 
in a directed graph. The processors 
synchronously receive data from one 
set of neighbors (e.g., North and West 
in a rectangular array), perform a com- 
putation, and transmit the computed 
quantity to another set of neighbors 
(e.g., South and East). [Sch89:Ch. 8] 
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tabu search: A heuristic search tech- 
nique that seeks to avoid cycles by 
forbidding or penalizing moves tak- 
ing the search to previously vis- 
ited solution spaces (hence “tabu”). 
[WP:Tabu_search] 


tangent angle function: Given a 
curve (xŒ), yŒ), the function 6@ = 


tan“! nee [TC94] 


tangent plane: The plane passing 
through a point on a surface that is 
perpendicular to the surface normal. 
[FP03:19.1.2] 


tangent space: In differential geometry, 
the vector space formed by the set of all 
tangent vectors at a point on a differen- 
tiable manifold. This is a generalization 
of the notion of the tangent plane to a 
surface. [Lov10:3.3] 


tangential distortion (lens): A particu- 
lar lens aberration created, among oth- 
ers, by lens decentering, usually mod- 
eled only in high-accuracy calibration 
systems. [LL96] 


target image: The image resulting from 
an image-processing operation: [EFO1] 


Source Image Target Image 


target recognition: 
target recognition. 
target_recognition] 


See automatic 
[WP:Automatic_ 


target tracking: The following of a given 
object over time, possibly through 


repeated detections of the object or 
by a single detection followed by 
feature tracking. Traditional applica- 
tions are military, but the term might 
now include other objects, e.g., peo- 
ple or vehicle tracking. There might be 
single or multiple simultaneous targets. 
[FP03:Ch. 17] 


task parallelism: Parallel processing 
achieved by the concurrent execu- 
tion of relatively large subsets of 
a computer program. A large sub- 
set might be defined as one whose 
run time is of the order of tens 
of milliseconds. The parallel tasks 
need not be identical, e.g., from a 
binary image, one task may compute a 
moment while another computes the 
perimeter. [WP:Task_parallelism] 


tattoo retrieval: An application of image 
retrieval in which the patterns are tat- 
toos. The intent is to help with identify- 
ing people in legal situations. [JLRG09] 


Tchebichef/Chebyshev moments: A 
set of moments based on the orthog- 
onal Tchebichef/Chebyshev polynomi- 
als, defined on a set of discrete points 
which makes them suitable for digital 
images. The moments can be used 
to define properties of shapes suit- 
able for object recognition, or for 
image compression, noise reduction 
etc. [FSZ09:6.2.5] 


tee junction: An intersection between 
line segments (possibly representing 
edges) where a straight line meets and 
terminates somewhere along another 
line segment. See also blocks world. 
A tee junction can give useful depth- 
ordering cues. In this figure, we can 
hypothesize that surface C lies in front 
of the surfaces A and B, given the tee 
junction at p: [Nal93:4.1.1] 
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telecentric optics: A lens system 
arranged such that moving the image 
plane along the optical axis does not 
change the magnification or image 
position of imaged world points. One 
embodiment is to place an aperture in 
front of the lens so that when an object 
is imaged off the focal plane of the lens, 
the center of the (blurred) object is the 
ray through the center of the aperture, 
rather than the center of the lens. Plac- 
ing the aperture at the lens’s front focal 
plane will ensure these rays are parallel 
after the lens: [WP:Telecentric_lens] 
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telepresence: Interaction with objects 
at a location remote from the user 
via vision or robotic devices. Examples 
include slaving of remote cameras to 
the motion of a head-mounted display 
worn by the user, transmission of audio 
from the remote location, use of local 
controls to operate remote machinery 
and haptic (i.e., touch) feedback from 
the remote to the local environment. 
[WP:Telepresence] 


template-based representation: A 
form of model or object representation 
in which the representation is com- 
posed of some form of image-based 
modeling, such as an image of 
the shape, the edges that bound 
the desired shape or a process for 
generating the template, such as a 


principal component representation. 
[BB82:3.2.1] 


template image: An image used as 


part of a template-based representation 
for template matching, detection 
or template tracking of an object. 
[BB82:3.2.1] 


template matching: A strategy for loca- 


tion of an object in an image. The 
template, a 2D image of the object, 
is compared with all windows of the 
same size as the template in the image 
to be searched. Windows where the 
difference with the template (as com- 
puted by, e.g., normalized correlation 
or a sum of squared differences (SSD)) 
is within a threshold are reported as 
instances of the object. This is inter- 
esting as a brute-force matching strat- 
egy. To obtain invariance to scale, rota- 
tion or other transformations, the tem- 
plate must be subjected explicitly to 
the transformations. [FP03:25.3.2] 


template tracking: A method of target 


tracking in video that uses small 
amounts of local image data as the tem- 
plates to be tracked. The templates 
could be based on neighborhoods; 
image features taken from previous 
frames (e.g., SIFT features); or ideal 
models (e.g., based on drawings). The 
core idea is to search for and match 
the template (e.g., via correlation) with 
image data in a new image to find 
the best correspondence. The figure 
shows a template match using corre- 
lation between a template (on the left) 
and patches (on the right): [JD02] 


BOOK AS 
TEMPLATE 


BOOK TRACKED IN 
NEW IMAGE 
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temporal alignment: The process of 


making time-to-time correspondences 
between two time-based signals. Exam- 
ples are the frames in video sequences, 
objects or motions extracted from the 
video, properties measured from data 
in the video, or other signals like 
speech. A key problem is that sig- 
nals may not advance at the same 
rate and there may be local variations 
in rates (e.g., synchronizing two peo- 
ple running cross country). Dynamic 
time-warping is a common technique 
used for temporal alignment. [CIO2] 


temporal averaging: Any procedure 


for noise reduction in which a sig- 
nal that is known to be static 
over time is sampled at differ- 
ent times and the results averaged. 
[LGZ+94] 


temporal correlation: The similarity of 


two time-varying signals and the pro- 
cess of assessing the similarity. For 
example, the pattern of leg motion in 
a runner over one gait cycle is very 
similar to the pattern of other cycles 
and other runners; the dropping of a 
drinking glass is followed by the glass 
breaking when it hits the floor; the 
image noise at a pixel is often related 
to the noise measured from the next 
image frame. The similarity of consec- 
utive video frames can be exploited to 
improve video compression. [BAP06] 


temporal event analysis: 1) An analysis 


that considers the temporal correlation 
of two events. For example, it may 
assess how often the event of a per- 
son looking at a store window display 
leads to the event of that person buying 
something in the shop. 

2) An analysis that assesses the match 
of some observed data to a temporal 
model of the data, e.g., the probabil- 
ity that a sequence of observations is 
explained by a given hidden Markov 
model. [ZI01] 


temporal model: A model that encodes 


how some phenomenon changes over 
time. The model could be explicit (e.g., 
a cosine signal), probabilistic (e.g., 
a hidden Markov model) or statisti- 
cal (e.g. a temporal Gaussian process 
model). [BS00a] 
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temporal 


temporal offset: 1) The amount of time 


shift needed for temporal alignment 
of two time-based sequences or sig- 
nals, e.g., how many seconds or frames 
between the start of two gait cycles. 

2) The time between two events, e.g., 
the start and end of a meal. [MSMP08] 


temporal process: A process or sig- 


nal that changes over time. See also 
time series and time-varying random 
process. 


temporal reasoning: The process of 


analyzing the temporal relationships 
between signals or events. For exam- 
ple, one might attempt to deter- 
mine if an event seen in one video 
stream is a good predictor of an event 
seen in another video stream. See 
temporal event analysis. [WP:Spatial- 
temporal_reasoning] 


temporal representation: A model rep- 


resentation that encodes the dynamics 
of how an object’s shape or position 
can vary over time. [SCH08] 


temporal resolution: The frequency of 


observations with respect to time (e.g., 
one per second) as opposed to the 
spatial resolution. [Kul71] 


temporal segmentation: The segmen- 


tation of a time signal into discrete 
continuous subsets. For example, a 
sequence might identify when a given 
action start and stops (e.g., a person 
standing up) or a situation changes (as 
in video cut analysis) or a repetitive sig- 
nal may be split into complete cycles. 
[WA94] 


temporal stereo: 1) Stereo achieved 


through movement of one camera 
rather than using two separate cam- 
eras. 

2) Integration of multiple stereo views 
of a dynamic scene to produce a better 
estimate of each view. [NA02] 


synchronization: See 
temporal alignment. 


temporal topology: A way of express- 


ing the temporal relationships 
between signals, events and activities 
independent of actual times. For 
example, one can say that event A 
happened before event B, without 


specifying how long before. An anal- 
ogous situation is in the relation of a 
geometric model to a graph model, in 
which the geometric model specifies 
where each feature is, but the graph 
model may only specify adjacency 
between features. [NRCC08] 


temporal tracking: See tracking. 


tensor product surface: A parametric 
model of a curved surface commonly 
used in computer modeling and graph- 
ics applications, The surface shape 
is defined by the product of two 
polynomial (usually cubic) curves in 
the independent surface coordinates. 
[JKS95:13.5.3] 


terrain analysis: Analysis and interpre- 
tation of data representing the shape 
of the planet’s surface. Typical data 
structures are digital elevation maps or 
triangulated irregular networks (TINS). 
[WG00:1.1] 


tessellated viewsphere: A division of 
the viewsphere into distinct subsets 
of (approximately) equal area. Often 
used as a data structure for represent- 
ing functions of the form fŒ where 
n is a unit normal vector in R*. Typi- 
cally constructed by subdivision of the 
viewsphere into a polygon mesh such 
as an icosahedron: [HY93] 


test set: The set used to verify a classifier 
or other algorithm. The test set con- 
tains only examples not included in the 
training set. [WP:Test_set] 


tetrahedral spatial decomposition: A 
method of decomposing 3D space 
into packed tetrahedrons instead of 


the more commonly used rectangu- 
lar voxel decomposition. A tetrahe- 
dral decomposition allows a recur- 
sive subdivision of a tetrahedron into 
eight smaller tetrahedra. The figure 
illustrates the decomposition with one 
of the eight smaller volumes shaded: 
[KL96] 


texel: See texture element. 
texon: See texture element. 


text analysis: In the context of image 
analysis, this term refers to the detec- 
tion and decoding of characters and 
words in image and video data. 
[WP:Optical_character_recognition] 


textel: See texture element. 


texton: Julesz’s 1981 definition of the 
units in which texture might be per- 
ceived. In the texton-based view, a tex- 
ture is a regular assembly of textons. 
[Jul8 1] 


texton boost: An approach to RGB pixel 
classification to label images in terms 
of textons: distinct clusters of textures 
(found by clustering multi-scale image 
derivatives), spatial layout and image 
context. Layout is based on rectangular 
regions over which texton type should 
be constant. Pairs of (texton,layout 
regions) become weak classifiers of 
pixels and a strong classifier is learned 
through a boosting method. Context 
exploits the relative spatial placement 
of the layout regions. Texton boosting 
can be used simultaneously for region 
segmentation and labeling. [SWRC09] 
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texture: The phenomenon by which uni- 
formity is perceived in regular (etymo- 
logically, “woven”) patterns of (possi- 
bly irregular) elements. In computer 
vision, texture usually refers to pat- 
terns in the appearance or reflectance 
on a surface. The texture may be reg- 
ular, i.e., it may satisfy some texture 
grammar, or it may be statistical 
texture, i.e., the distribution of pixel 
values may vary over the image. Tex- 
ture can also refer to variations in the 
local shape on a surface, e.g., its degree 
of roughness. See also shape texture. 
[NA05:8.2] 


texture-based image retrieval: 
Content-based image retrieval that uses 
texture as its classification criterion. 
[WP:Content-based_image_retrieval] 


texture boundary: The boundary 


between adjacent regions in texture 
segmentation. Also, the boundary 
perceived by humans between two 
regions of different textures. The fig- 
ure shows the boundary between three 
regions of different color and shape 
texture: [KE89] 


texture classification: Assignment of 
an image (or a window of an image) to 
one of a set of texture classes, which 
are typically defined by presentation by 
a human of training set images repre- 
senting each class. The basis of texture 
segmentation. [NA05:8.4] 


texture descriptor: A vector-valued 
function computed on an image sub- 
window that is designed to produce 
similar outputs when applied to differ- 
ent subwindows of the same texture. 
The size of the image subwindow con- 
trols the scale of the detector. If the 
response at a pixel position (x, y) 
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is computed as the maximum over 
several scales, an additional scale out- 
put s(x, y) is available. See also texture 
primitive. [NA05:8.3] 


texture direction: The texture gradient 
or a 90° rotation thereof. [BB82:6.5] 


texture element (texel): A small geo- 
metric pattern that is repeated fre- 
quently on some surface resulting in 
a texture. [BB82:6.2] 


texture energy measure: A single- 
valued texture descriptor with strong 
response in textured regions. A texture 
descriptor may be formed by combin- 
ing the results of several texture energy 
measures into a vector. [JRL97] 


texture enhancement: A proce- 
dure analogous to edge-preserving 
smoothing in which texture bound- 
aries rather than edges are to be 
preserved. [Wei95] 


texture field grouping: See texture 
segmentation. 


texture field segmentation: See 
texture segmentation. 


texture gradient: The gradient of a sin- 
gle scalar output Vs(x, y) of a texture 
descriptor. A common example is the 
scale output, for homogeneous tex- 
ture, whose texture gradient can be 
used to compute the foreshortening 
direction. [BB82:6.5] 


texture grammar: Grammar used to 
describe textures as instances of sim- 
pler patterns with a given spatial 
relationship Gincluding other textures 
defined previously in this way). A 
sentence from this grammar would 
be a syntactic texture description. 
[BB82:6.3.1] 


texture mapping: In computer graph- 
ics, rendering a polygonal surface 
where the surface color at each out- 
put screen pixel is obtained by inter- 
polating values from an image. The 
source image pixel location is com- 
puted using correspondences between 
the polygon’s vertex coordinates and 
texture coordinates on the texture 
map. [WP:Texture_mapping] 


texture matching: Matching of regions 
based on texture descriptions. [SZ01] 


texture model: The theoretical basis for 
a class of texture descriptor. For exam- 
ple, autocorrelation of linear filter 
responses, statistical texture descrip- 
tions or syntactic texture descriptions. 
[PSOO] 


texture modeling: A family of tech- 
niques for generating surface color 


texture synthesis: The generation of 
synthetic images of textured scenes. 
More particularly, the generation of 
images that appear perceptually to 
share the texture of a set of training 
examples of a texture. [FP03:Ch. 9] 


Theil-Sen estimator: A robust 
estimator for curve fitting. A fam- 


texture and shape texture, such as 
texture mapping or bump mapping. 
Special classes, such as hair or skin, can 
be modeled and the texture can vary 
over time. The models can be used to 
generate texture for computer graph- 
ics, for a texture descriptor, as a prop- 
erty used for region segmentation etc. 
[Elb05] 


texture motion: The blurring that is pro- 
duced when a textured surface moves 
relative to the camera. This is mainly of 
interest in computer graphics, for the 
generation of realistic blurred textures 
when seen by a moving observer. 


texture orientation: See 
gradient. 


texture 


texture primitive: A basic unit of tex- 
ture (e.g., a small pattern that is 
repeated) as used in syntactic texture 
descriptions. [LWY99] 


texture recognition: See 
classification. 


texture 


texture region extraction: See texture 
field segmentation. 


texture representation: See texture 
model. 


texture segmentation: Segmentation of 
an image into patches of coherent 
texture. The figure shows a region seg- 
mented into three regions based on 
color and shape texture: [FP03:Ch. 9] 


ily of curves is parameterized by 
parameters a.p, and is to be fit to data 
Xn. If q is the smallest number of 
points that uniquely define a.p, then 
the Theil-Sen estimate of the optimal 
parameters a, are the parameters 
that have the median error measure of 
all the g-point estimates. For example, 
for line fitting, the number of parame- 
ters (slope and intercept) is p = 2 and 
the number of points required to give 
a fit is also q = 2. Thus the Theil-Sen 
estimate of the slope gives the median 
error of the (7) two-point slope 
estimates. The Theil-Sen estimator is 
not statistically efficient, nor does it 
have a particularly high breakdown 
point, in contrast to such estimators as 
RANSAC and least median of squares. 
[AML95] 


thermal imaging: Acquiring image data 
in the infrared light range (approxi- 
mately 750 nanometers to 300 microm- 
eters wavelength). There are advan- 
tages to the use of infrared: 

e Itallows passive sensing, even in the 
dark. 

e It allows measurement of the tem- 
perature of objects, which can be 
used for a variety of diagnostic pur- 
poses, such as health or energy loss 
monitoring. 

[WP:Thermography] 


thermal noise: In CCD cameras, addi- 
tional electrons released by thermal 
vibration in the substrate that are 
counted with those released by inci- 
dent photons. Thus, the gray scale 
values are corrupted by an additive 
Poisson noise process. [WP:Johnson- 
Nyquist noise] 


thickening operator: Thickening is a 
mathematical morphology operation 
that is used to grow selected regions 
of foreground pixels in binary images, 
somewhat like the dilate operator or 
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close operator. It has several appli- 
cations, including determining the 
approximate convex hull of a shape 
and the skeleton by influence zone. 
Thickening is normally only applied to 
binary images and it produces a binary 
image as output. The figure shows 
thickening six times in the horizontal 
direction: [Jai89:9.9] 


thin plate model: A model of surface 


smoothness used in the variational 
approach. The internal energy (or 
bending energy) of a thin plate 
represented as a parametric surface 
(x, y, fŒ, y) is given by f2, + 2/2, + 


(jy LEPO3:26.1.1] 


thinning operator: Thinning is a 


mathematical morphology operation 
that is used to remove selected 
foreground pixels from binary images, 
somewhat like the erode operator or 
the open operator. It can be used for 
several applications, but is particularly 
useful for skeletonization and to tidy 
up the output of edge detection by 
reducing all lines to one-pixel thick- 
ness. Thinning is normally only applied 
to binary images and produces a binary 
image as output. The figure shows the 
thinning of a region: [JKS95:2.5.11] 


three-CCD camera: An imaging camera 


that uses three separate CCD sensors to 
capture the red, green and blue com- 
ponents of an image, in contrast to a 
single-CCD sensor that uses three dif- 
ferent filters on the single CCD chip 
(e.g., using the Bayer pattern). The 


292 


TIFF: Tagged 


advantage of using three sensors is bet- 
ter color quality and spatial resolution. 
[Sze10:pp. 85-86] 


three-view geometry: See trinocular 


stereo. 


threshold selection: The automatic 


choice of threshold values for conver- 
sion of a scalar signal (such as a gray 
scale image) to a binary image. Often 
proceeds by analysis of the histogram 
of the sample values (e.g., Otsu’s 1979 
method). Different assumptions about 
the underlying distributions yield dif- 
ferent strategies. [JKS95:3.2.1] 


thresholding: Quantization into two val- 


ues. For example, conversion of a 
scalar signal (such as a gray scale 
image) to a binary image. The figure 
shows an input image and its thresh- 
old output: [JKS95:2.1, 3.2.1] 


b w 


thresholding with hysteresis: Thresh- 


olding of a time-varying scalar signal 
where the threshold value is a function 
of previous signal and threshold val- 
ues. For example a thermostatic con- 
trol based on temperature receives a 
signal s(t) and generates an output sig- 


nal bC) of the form 
a sŒ > Teold if b(t TT D =0 
ee ee <tho if (@-1)=1 


where the value at time ¢ depends on 
the previous decision b(t — 1). In com- 
puter vision, often associated with the 
edge following stage of the Canny edge 
detector. [NA05:4.2.5] 


tie point: A pair of matched points in 


two different images (or in an image 
and a map etc). Given enough tie 
points, one can estimate geometric 
relations between the images, such 
as rectification or image alignment 
geometric transformations, or the 
fundamental matrix. [LRKH06:4.2.5.1] 


image file format. 


[Umb98:1.8] 


tilt: The tilt direction of a 3D surface 
patch as observed in a 2D image is par- 
allel to the projection of the 3D surface 
normal into the image. If the 3D surface 
is represented as a depth map xx, y) in 
image coordinates, then the tilt direc- 
tion at (x, y) is the unit vector paral- 


lel to (2, a The tilt angle may be 


defined as tae) Ge [FP03:9.4.1] 


dx 


time delay index: An integer index 
of the lag between two functions: if 
g@® = f@—nA,) then the n is the 
time delay index, A; is the sample time 
and nA, is the total delay or lag of g 
behind f. 

time derivative: A technique for com- 
puting how an image sequence 
changes over time. Typically used as 
part of shape from motion. [KWM94] 


time-of-flight range sensor: A sen- 
sor that computes distance to target 
points by emitting electromagnetic (or 
other) radiation and measuring the 
time between emitting the pulse and 
observing the reflection of the pulse. 
[BM02:1.9.2] 


time series: One or more vari- 
ables observed sequentially in time. 
[Cha89:p. 1] 


time to collision: See time to contact. 


time to contact: From a sequence of 
images I(t), computation of the value 
of ¢ at which, assuming constant 
motion, an image object will intersect 
the plane parallel to the image plane 
that contains the camera center. It can 
be computed even in the absence of 
metric information about the imaging 
system, i.e., in an uncalibrated vision 
setting. [CB92] 


time to impact: See time to contact. 


time-varying random process: A 
stochastic process in which the index 
variable is time. 


time-varying shape: A shape that varies 
over time or in an image sequence. 
Examples include the shape of a per- 
son as they walk, a person growing 
and the beating of a heart. A variety 
of deformable models can be used to 
model the shape, such as the active 
shape model. [SBSO2] 


tolerance band algorithm: An algo- 
rithm for incremental segmentation 
of a curve into straight-line elements. 
Assume that the current straight line 
segment defines two parallel bound- 
aries of a tolerance zone at a pre- 
selected distance from the line seg- 
ment. When a new curve point leaves 
the tolerance zone, the current line 
segment is ended and a new segment 
is started: [JKS95:6.4.2] 


TOLERANCE EXIT 
ZONE POINT 
G J 
tolerance interval: An interval 


within which a stated propor- 
tion of some population will lie. 
[WP:Tolerance_interval] 


Tomasi-Kanade factorization: A 
maximum-likelihood solution to 
structure and motion recovery in 
the situation where points in a static 
scene are observed by affine cameras 
and the observed (x, y) positions are 
corrupted by Gaussian noise. The 
method depends on the observation 
that if m points are observed over 
n views, the 2nxm measurement 
matrix containing all the observations 
(after certain transformations have 
been performed) is of rank 3. The 
closest rank-3 approximation of the 
matrix is reliably obtained via singular 
value decomposition, after which 
the 3D points and camera positions 
are easily extracted, up to an affine 
ambiguity. [TK92] 


tomography: A technique for the recon- 
struction of a 3D volumetric dataset 
based on a number of 2D slices. The 
most common examples occur in med- 
ical imaging (e.g., nuclear magnetic 
resonance and positron emission 
tomography). [WP:Tomography] 


tongue print: A form of personal bio- 
metric that is supposed to be unique 
to individuals, much like fingerprints. 
[ZLYS07] 
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top-down: A reasoning approach that 


searches for evidence for high-level 
hypotheses in the data. For exam- 
ple, a hypothesize-and-test algorithm 
might have a strategy for making good 
guesses as to the position of circles 
in an image and then compare the 
hypothesized circles to edges in the 
image, choosing those that have good 
support. Another example is a human 
body recognizer that employs body 
part (e.g., heads, legs and torso) rec- 
ognizers that directly use image data 
or recognize even smaller subparts. 
[BB82:10.4.2] 


top-down model inference: A form 


of reasoning where the analysis pro- 
ceeds from the whole object to the 
subcomponents. High-level informa- 
tion generates predictions of struc- 
ture at lower levels and these predic- 
tions are then tested. For example, 
one could construct a statistical clas- 
sifier that assessed the likelihood that 
a region contained a person, before 
refining that region with a subclassi- 
fier that assessed subregions for the 
likelihood of containing a head, and 
then further, using a subclassifier that 
looked for an eye. Failure of the lower 
levels of inference might lead to failure 
at the top level. This contrasts with the 
bottom-up approach. [BB82:10.4, pp. 
343] 


top-hat operator: A mathematical 
morphology operation used to remove 
structures from images. The top-hat 
filtering of image J by structuring 
element S$ is the difference J — 
open, S), where opendU/, S) is the 
morphological open operator of I by 
S. [ZLPO6] 


topic model: Usually described in rela- 


tion to a set of documents and the 
words they contain. For computer 
vision, this could be translated to a set 
of images, each of which is described 
by a bag of features. The idea is to 
model the distribution of words in a 
document in terms of the proportions 
of a smaller number of topics. See also 
latent Dirichlet allocation (LDA) and 
probabilistic latent semantic analysis 
(PLSA). [Mur12:24.3] 
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topographic map: 1) A representation 


that shows both geometric shape and 
identified features, such as man-made 
structures in a land map. Heights are 
typically represented using elevation 
contours. This sort of map can also be 
used to represent the results of image 
analysis. 

2) In the nervous system, a layout of 
neurons where neural response prop- 
erties vary systematically with the spa- 
tial position of the neuron within 
the neural area. [WP:Topographic_ 
map] 


topological property: Properties that 


are not linked to the metric property 
on the space, in other words, 
are invariant to general geometric 
transformations (e.g., rotation, scaling 
and translation). Examples of topolog- 
ical properties are adjacency, connect- 
edness, number of regions or compo- 
nents etc. [Weil2:Topology] 


topological representation: Any rep- 


resentation that encodes connected- 
ness of elements. For example, in a 
surface boundary representation com- 
prising faces, edges and vertices, the 
topology of the representation is the 
list of face-edge and edge-vertex con- 
nections, which is independent of the 
geometry (spatial positions and sizes) 
of the representation. In this case, the 
fundamental relation is “bounded by”, 
so a face is bounded by one or more 
edges and an edge is bounded by zero 
or more vertices. [CWSI87] 


topology: 1) Properties of point sets 


(such as surfaces) that are unchanged 
by continuous reparameterizations 
(homeomorphisms) of space. 

2) The connectedness of objects in 
discrete geometry (see topological 
representation). One speaks of the 
topology of a network, meaning the 
set of connections within the network 
or, equivalently, the set of neighbor- 
hood relationships that describe the 
network. [Sch89:Ch. 6] 


topology inference: 1) Construction 


of the graph of relations between 
image structures, such as the overlap 
between different images of the same 
portion of a scene. 


2) Construction of the graph of rela- 
tions between nodes in a probabilistic 
graphical model. 

3) Inference of the connectivity of a 
communications network. 

What these definitions have in com- 
mon is the inference of a graph or net- 
work from data. [SHK98] 


torsion: A concept in the differential 
geometry of curves formally represent- 
ing the intuitive notion of the local 
twisting of a 3D curve as you move 
along the curve. The torsion t( of a 
3D space curve C() is the scalar 


a. [FO 6, €0)] 
TOURT OE. —— 
dt EOK 


where AC) is the curve normal and 


bC) the curve binormal. The notation 
[x, y, Z] denotes the scalar triple prod- 
uct x - Œ x Z). [FP03:19.1.1] 


torsion scale space (TSS): A descrip- 
tion of a space curve based on 
movement along the curve of the 
zero-crossing points of the torsion 
function of the curve as the smooth- 
ing scale increases. The smoothing is 
accomplished by Gaussian smoothing 
of the space curve with increasing 
scale. A visual representation of the TSS 
is typically organized as a 2D plot of the 
scale at which zero crossings appear as 
a function of the arc length. [MB03:3.4] 


torus: 1) The volume swept by moving a 
sphere along a circle in 3D. 
2) The surface of such a volume. 
[WP:Torus] 


total least squares: See orthogonal 
regression. 


total variation: A class of regular- 
izer in the variational approach. The 
total variation regularizer of function 
f@:R"& R is of the form RA = 
Jo |V f@|dx where Q is (a subset of) 
the domain of f. [WP:Total_variation] 


total variation regularization: A type 
of regularization where the penalty 
term is of total variation form. 
[WP:Total_variation_denoising] 


tracking: A means of estimating the 
parameters (e.g., feature point posi- 
tions, target object positions, human 


joint angles) of a dynamic system that 
evolve over time and for which there 
are measurements (e.g., photographs) 
obtained at successive time instants. 
The task of tracking is to maintain an 
estimate of the probability distribution 
over the model parameters, given the 
measurements, as well as a priori mod- 
els of how the parameters change over 
time. Common algorithms for tracking 
include the Kalman filter and particle 
filters. Tracking may be viewed as a 
class of algorithms that operate on 
sequences of inputs, using assump- 
tions about the coherence of succes- 
sive inputs to improve performance of 
the algorithm. Often the task of the 
algorithm may be cast as estimation of a 
state vector - a set of parameters, such 
as the joint angles of a human body - 
at successive time instants t. The state 
vector X(t) is to be estimated using a set 
of sensors that yield observations Z£), 
such as the 2D positions of bright spots 
attached to a human. In the absence 
of temporal coherence assumptions, x 
must be estimated at each time step 
solely using the information in ZŒ). 
With coherence assumptions, the sys- 
tem uses the set of all observations so 
far {Z(t), T < t} to compute the esti- 
mate at time ¢. In practice, the estimate 
of the state is represented as a probabil- 
ity density over all possible values, and 
the current estimate uses only the pre- 
vious state estimate x(t — 1) and the 
current measurements Z(t) to estimate 
x(t). [FP03:Ch. 17] 


traffic analysis: Analysis of video data 
of automobile traffic, e.g., to iden- 
tify number plates, detect accidents or 
congestion, compute throughput etc. 
[KWH+94] 


traffic sign recognition: An algorithm 
for the recognition or classification of 
road traffic information signs, such as 
“stop” signs, from single images or 
video. Detection of the sign is usually 
a fundamental initial step. This capabil- 
ity is needed by autonomous vehicles. 
[WP:Traffic_sign_recognition] 


training set: The set of labeled exam- 
ples used to learn the parameters 
of a classifier. In order to build an 
effective classifier, the training set 
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should be representative of the exam- 
ples that will be encountered in 
the eventual domain of application. 
[WP:Training_set] 


trajectory: The path that a moving point 


makes over time. It could also be 
the path that a whole object takes 
if less precision of usage is desired. 
[WP: Trajectory] 


trajectory-based representation: A 


description of an object, vehicle or 
behavior based on the trajectory or 
path of the observed object. The path 
may be described in either the image 
or the scene space. A typical applica- 
tion is the detection of potential car 
crime activities in a car-parking lot by 
watching for atypical patterns of peo- 
ple walking. [BKSO7] 


trajectory estimation: Determination 


of the 3D trajectory of an object 
observed in a set of 2D images. [LSG07] 


trajectory transition descriptor: A 


descriptor of a trajectory and its state 
transitions in the form of a directed 
graph. The graph is transformed into a 
translation-invariant and scale-invariant 
form, with the resulting displacements 
between consecutive frames quantized 
into bins, which form the nodes of a 
graph. Consecutive bin positions along 
the trajectory become the arcs in the 
graph. Finally the arcs acquire a label 
according to the number of times the 
arc is traversed. The graph can be 
encoded as a transition matrix, from 
which a vector descriptor is obtained 
that encodes the trajectory. [GX11:p. 
153] 


transformation: A mapping of data in 


one space (such as an image) into 
another space. All image-processing 
Operations are transformations. 
[WP:Transformation_(function)] 


transformation matrix: A matrix (e.g., 


M) used in a linear space to transforma 
vector (e.g., X) by multiplication (e.g., 
Mx). Transformation matrices can be 
used for geometric transformations 
(e.g., rotation) of position vectors, 
color transformations of (e.g., RGB) 
color vectors, projections from 3D into 
2D etc. [Sze10:2.1] 
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translation: A transformation 
of Euclidean space that can 
be represented in the form 
XO T@=xXUX+F. In projec- 
tive space, a transformation that 
leaves the plane at infinity pointwise 
invariant. [BB82:A1.7.4] 


translation invariant: A property that 
keeps the same value even if the data, 
the scene or the image from which 
the data comes is translated. The dis- 
tance between two points is translation 
invariant. [RP98] 


translucency: The transmission of light 
through a diffusing interface such as 
frosted glass. Light entering a translu- 
cent material has multiple possible exit 
directions. [WP:Translucence] 


transmittance: The ratio of the (“outgo- 
ing”) power transmitted by a transpar- 
ent object to the incident (“incoming”) 
power. [WP:Transmittance] 


transparency: The property of a sur- 
face to be traversed by radiation 
(e.g., by visible light), so that objects 
on the other side can be seen. A 
non-transparent surface is opaque. 
[WP:Transparency_and_translucency] 


transparent layer: An image display 
technique whereby an image is made 
partially transparent before it is over- 
laid on another image, thus allowing 
image data from both to be seen. A sim- 
ilar effect is occasionally seen in real 
scenes, e.g., when one can see reflec- 
tions on a window as well as the scene 
through the window. 


tree classifier: A classifier that applies 
a sequence of tests to input points x 
in order to determine the label / of 
the class to which it belongs. See also 
decision tree. [SL91] 


tree search method: A class of algo- 
rithms to optimize a function defined 
on tuples of values taken from a 
finite set. The tree describes the set 
of all such tuples and the order in 
which tuples are explored is defined 
by the particular search algorithm, 
such as depth-first, breadth-first, A* 
and best-first searches. Applications 
include the interpretation tree search. 
[WP:Tree traversal] 


triangulated model: See surface mesh. 


constraint which is applied in the two- 


triangular norms: Also known as T- 
norms. A family of binary operators 
for combining [0,1] certainty values 
in a form of AND or logical conjunc- 
tion suitable for uncertainty reason- 
ing. There are several possible T-norms 
with different combining functions, 
including the classical logical AND. 
[WP:T-norm] 


triangulation: See Delaunay 
triangulation, surface triangulation, 
stereo triangulation and structured 
light triangulation. 


trichromacy: Having three independent 
color receptor channels, as in RGB for 
cameras and long, medium and short 
(LMS) wavelength receptors for the pri- 
mate retina. [FP03:6.2.2] 


trifocal tensor: The geometric entity 
that relates the images of 3D points 
observed in three perspective 2D 
views. Algebraically represented as a 
3 x 3 x 3 array of values T; * If a sin- 
gle 3D point projects to x, x’, x” in the 
first, second, and third views respec- 
tively, it must obey the nine equa- 
tions (using Einstein summation nota- 
tion over a, p, i, j, kK) 


AT € jar KC Exp) TS? ra 0,5 


for r and s varying from 1 to 3; € is the 
epsilon tensor for which 


1 ijk an even 
permutation of 123 

0 two of 7, j, k equal 

—1 ijkan odd 
permutation of 123 


Eijk = 


As this equation is linear in the ele- 
ments of T, it can be used to esti- 
mate them given enough 2D point cor- 
respondences x, x, x”. As not all 3 x 
3 x 3 arrays represent realizable cam- 
era configurations, estimation must 
also incorporate several nonlinear 
constraints on the tensor elements. 
[FP03:10.2] 


trilinear constraint: The geometric 
constraint on three views of a point 
(i.e., the intersection of three epipolar 
lines). This is similar to the epipolar 


view scenario. [FP03:10.2.1-10.2.3] 


trilinear tensor: Another name for the 
trifocal tensor. [FP03:10.2] 


trilinearity: An equation in a set of three 
variables in which holding two of the 
variables fixed yields a linear equation 
in the remaining one. For example 
xyz = 0 is trilinear in x, y and z, while 
x? = y is not, as holding y fixed yields 
a quadratic in x. [HZ00:14.2.1] 


trimap: An image segmentation, 
often manually, into three regions 
of foreground, background and 
unknown. The trimap can be used as a 
shape prior to guide segmentation or 
labeling. [RRRSO8] 


trinocular stereo: A multi-view stereo 
process that uses three cameras. 
[Fau93:6.9] 


tristimulus theory of color per- 
ception: The human visual system 
has three types of cone, with three 
different spectral response curves, so 
that the perception of any incident 
light is represented as three intensities, 
roughly corresponding to long (max- 
imum about 558-580 nm), medium 
(531-545 nm) and short (410- 
450 nm) wavelengths. [WP:CIE_ 
1931_color_space#Tristimulus_values] 


tristimulus values: The relative 
amounts of the three primary colors 
that need to be combined to match a 
given color. [Jai89:3.8] 


true negative: A binary classifier c(x) 
returns + or - for an example x. A 
true negative occurs when the classi- 
fier returns - for an example that is in 
reality -. Compare with false negative. 
[Mur12:8.3.4] 


true positive: A binary classifier c(x) 
returns + or - for an example x. A 
true positive occurs when the classi- 
fier returns + for an example that is in 
reality +. Compare with false positive. 
[Mur12:8.3.4] 


truncated median filter: An approxi- 
mation to mode filtering when image 
neighborhoods are small. The filter 
uses an image sharpening operator 
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on blurred image edges as well as 
noise reduction methods. The algo- 
rithm truncates the local distribution 
on the mean side of the median 
and then recomputes the median of 
the new distribution. The algorithm 
can iterate and, under normal circum- 
stances, converges approximately to 
the mode even if the observed distri- 
bution has very few samples with no 
obvious peak. [Dav90:3.4] 


tube camera: See tube sensor. 


tube sensor: A vacuum tube with a pho- 
toconductive window that converts 
light to a video signal. Once the only 
type of light sensor, the tube camera 
is now largely superseded by the CCD 
but it remains useful for some high 
dynamic range imaging. The image 
orthicon tube (or “immy”) is remem- 
bered in the name of the US Academy 
of Television, Arts and Sciences Emmy 
awards. [Gal90:2.1.3] 


twist: A 3D rotation representation com- 
ponent that specifies a rotation about 
the vector defined by the azimuth and 
elevation. The figure shows the pitch 
rotation direction: [SHKM92] 
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twisted cubic: The curve (1, t,t, t) 
in projective 3-space, or any projec- 
tive transformation thereof. The gen- 
eral form is thus 


x dii G2 Ax i4 1 
X| _ |21 G2 a3 ax t 
= 2 

X3 a, 2 a33 a4] \t 
3 

X4 a4, A42 Aá; A44 t 


The projection of a twisted cubic into 
a 2D image is a rational cubic spline. 
[HZ00:2.3] 


two-view geometry: 
stereo. 


See binocular 


type I error: A true hypothesis that has 
been rejected. [Nal93:3.1.1] 


type I error: A false hypothesis that has 
been accepted. [Nal93:3.1.1] 


ultrasonic imaging: Creation of images 
by the transmission and recording of 
reflected ultrasonic pulses. A phased 
array of transmitters emits a set of 
pulses, and then records the returning 
pulse intensities. By varying the rela- 
tive timings of the pulses, the returned 
intensities can be made to corre- 
spond to locations in space, allow- 
ing measurements to be taken from 
within ultrasonic-transparent materials 
(including the human body, excluding 
air and bone). [FP03:18.6.1] 


ultrasound sequence registration: 
Registration of overlapping ultrasound 
images. [FP03:18.6] 


ultraviolet (UV): Description of electro- 
magnetic radiation with wavelengths 
between about 300-420 nm (near 
ultraviolet) and 40-300 nm (far ultra- 
violet). The short wavelengths make it 
useful for fine-scale examination of sur- 
faces. Ordinary glass is opaque to UV 
radiation; quartz glass is transparent. 
Often used to excite fluorescent mate- 
rials. [Hec87:3.6.5] 


umbilic: A point on a surface where the 
curvature is the same in every direc- 
tion. Every point on a sphere is an 
umbilic point. [JKS95:13.3.2] 


umbra: The completely dark area of a 
shadow caused by a particular light 
source (i.e., where no light falls from 
the light source): [FP03:5.3.2] 


No shadow 
Light Source Fuzzy shadow 


Umbra (complete 
shadow) 


uncalibrated approach: See uncalib- 
rated vision. 


uncalibrated camera: This term is 


typically encountered in applications 
involving projective geometry where 
there is no camera calibration. The 
typical calibration parameters are the 
extrinsic parameters (e.g., position 
and orientation) and the intrinsic 
parameters (e.g., focal length, scene- 
to-sensor scaling and lens distortion). 
The assumption is that one can recover 
the desired information, e.g., scene 
geometry, without having to explic- 
itly calibrate the camera. For exam- 
ple, it might be possible to effectively 
infer the calibration by exploiting con- 
straints implicit in the data. [BCB97] 


uncalibrated stereo: Stereo reconstruc- 


tion performed without precalibration 
of the cameras. Particularly, given a 
pair of images taken by unknown cam- 
eras, the fundamental matrix is com- 
puted from point correspondences, 
after which the images may be subject 
to stereo image rectification and con- 
ventional calibrated stereo may pro- 
ceed. The results of uncalibrated stereo 
are 3D points in a projective coordi- 
nate system, rather than the Euclidean 
coordinate system that a calibrated 
setup admits. [Fau92] 


uncalibrated vision: The class of vision 


techniques that require no quantita- 
tive information about the camera used 
in capturing the images on which 
they operate. These techniques can be 
applied to archive footage. In partic- 
ular, they can be applied to geomet- 
ric problems, such as stereo recon- 
struction, that traditionally required 
that the images be from a cam- 
era system upon which calibration 
measurements had been made. In 
some uncalibrated approaches (such 
as uncalibrated stereo), the traditional 
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calibration step is replaced by pro- 
cedures that can use image fea- 
tures directly; in others (such as 
time-to-contact computations), the cal- 
ibration parameters can be factored 
out. In general, uncalibrated systems 
have degrees of freedom that cannot 
be measured, such as overall scale or 
projective ambiguity. [Har92] 


uncertainty: Having limited knowledge, 
so that it is not possible to specify 
exactly the current state or a future out- 
come. Probability theory provides one 
framework for reasoning under uncer- 
tainty. [KFO9:p. 2] 


uncertainty representation: A strategy 
for representation of the probability 
density of a variable as used in a vision 
algorithm. In a similar manner, an inter- 
val can be used to represent a range of 
possible values. [Nal93:8.3] 


under-fitting: Assume a set of models 
for a data set with increasing model 
complexity, e.g., arising from polyno- 
mial regression, where the order of the 
polynomial determines the complex- 
ity, or from a multilayer perceptron 
network that has an increasing num- 
ber of hidden units. As the model com- 
plexity varies from low to high, the 
model will initially under-fit the data 
generator, in that it does not have suf- 
ficient flexibility to model the struc- 
ture in the data. After passing through 
a point of optimum complexity, the 
model will then start over-fitting the 
data. [Mur12:1.2.6] 


under-segmented: Given an image 
where a desired segmentation result is 
known, the algorithm under-segments 
if regions output by the algorithm are 
generally the union of many desired 
regions. This image should be seg- 
mented into three regions but it was 
under-segmented into two regions: 
[SQ04:8.7] 
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undirected graph: A graph in which the 


arcs go in both directions (in contrast 
to a directed graph). Adjacency is a 
property that can be used in an undi- 
rected graph: adj(A,B) means region 
A is adjacent to region B and implies 
adj(B,A), i.e., region B is adjacent to 
region A: [Weil2:Undirected Graph] 


Undirected graph Directed graph 
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undirected graphical model (UGM): 


Defines a joint probability distribution 
over a set of random variables (denoted 
X). The graphical structure is an 
undirected graph. The joint distribu- 
tion is defined by a product of positive 
factors, one for each maximal clique in 
the graph, as p(X) = 5 Te WC). Here 
W(X) denotes the clique function for 
clique c, and Z is the partition function 
or normalization constant required to 
make p(x) sum to 1. In the figure, 
the maximal cliques are (1, 2), (2, 3, 5), 
G, ®, (4, 6) and (5, 6): 


Because the potentials W.(%,) are 
strictly positive, it is convenient 
to express them as exponentials, 
i.e., YX) = exp(—E-(X)). With this 
representation we obtain p= 
5 exp(— >. E(X), which is known 
as the Boltzmann distribution. A UGM 
can also be called a Markov random 
field (MRF); the Markov term here 
relates to the fact that the conditional 
distribution of variable x;, given the 
other variables, depends only on its 
neighbors in the graph. One common 


use of a UGM in computer vision is 
with a grid graphical structure corre- 
sponding to nearest neighbor interac- 
tions between pixels, with a variable 
at each node (e.g., a binary variable 
indicating a foreground or background 
label). The pairwise interactions typi- 
cally encode the regularity that nearby 
sites will have the same label. See 
also conditional random field (CRF). 
[Mur12:6.3] 


uniform distribution: A probability 
distribution in which a variable can 
take any value in the given range 
with equal probability. [WP:Uniform_ 
distribution_(continuous)] 


uniform illumination: An idealized 
configuration in which the arrange- 
ment of lighting within a scene is 
such that each point receives the same 
amount of light energy. In computer 
vision, sometimes uniform illumina- 
tion has a different meaning: that each 
point in an image of the scene (or a 
part thereof such as the background) 
has similar imaged intensity. [L083] 


uniform noise: Additive corruption of 
a sampled signal. If the signal’s sam- 
ples are s; then the corrupted signal is 
$; = s; + n; where the n; are uniformly 
randomly drawn from a specified range 
[a, 61. [SS77] 


unimodal: A function or distribution 
with a single peak: [Weil2:Unimodal] 


uniqueness stereo constraint: When 
performing stereo matching or stereo 
reconstruction, matching can be sim- 
plified by assuming that a point in one 
image corresponds to only one point 
in other images. This is generally true, 
except at object boundaries and other 
places where pixels are not completely 
opaque. [Fau93:6.2.2] 


unit ball: An n-dimensional sphere of 
radius 1. [Nal93:2.2] 


unit quaternion: A four-vector g € Rf. 
Quaternions of unit length can be used 
to parameterize 3D rotation matrices. 
Given a quaternion with components 
(do; qı, d2, q3) the corresponding rota- 
tion matrix R (letting S = g@ — q? — 
qe — g) is: 
| S+2q@f 2q% +244% 2q - ae 
2q ——2G093  S+2q; 2424s + 2qoqı 
2q +244 244; — -24o S +25 


The identity rotation is given by the 
quaternion (1,0,0,0). The rotation 
axis is the unit vector parallel to 
(di. @. Gs). [WP:Quaternion] 


unit vector: A vector of length 1. 
[WP:Unit_vector] 


unitary transform: A reversible trans- 
formation (e.g., the discrete Fourier 
transform). U is a unitary matrix where 
U'U = I, U* is the adjoint matrix and I 
is the identity matrix. [Jai89:2.7, 5.2] 


universal image quality index (UIQD: 
A model for quantifying image 
distortion based on loss of correlation, 
luminance distortion and contrast dis- 
tortion, which correlates better with 
human perception of distortion than 
traditional sum-of-squared-error meth- 
ods. [WBO2a] 


unrectified: When a stereo camera pair 
has not been subject to stereo image 
rectification. [JKS95:12.5] 


unsharp operator: An image enhance- 
ment operator that sharpens edges 
by adding a version of an image put 
through a high-pass filter to itself. The 
high-pass filter is implemented by sub- 
tracting a smoothed version of the 
image yielding 


Lunsharp =I+ ad = Ismooth) 


The figure shows an input image and 
its unsharped output: [Sch89:Ch. 4] 
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unsupervised 


unsupervised 


unsupervised behavior modeling: 


A form of unsupervised learning 


used for learning behavior models 
based on examples, without human 
intervention. For example, one could 
build models based on clustering tra- 
jectory-based representations. [XG08] 


clustering: An 
unsupervised learning method that 
outputs a clustering. [HTF08:14.3] 


feature selection: 
Feature selection for an unsupervised 


learning problem, where there is no 
response or target variable. For exam- 
ple one may eliminate features that are 
highly correlated or dependent with 
the retained subset or select a subset 
of features so as to maximize an index 
of clustering performance. [MMP02] 


unsupervised learning: Finding inter- 


esting patterns or structure in the 
data without teaching or a supervi- 
sory signal to guide the search process. 
Clustering, latent variable models and 
dimensionality reduction are exam- 
ples of unsupervised learning. Contrast 
with supervised learning. [Bis06:Ch. 1] 
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updating 


unusual behavior detection: The dis- 


covery of unusual behavior, usually by 
analysis of video data. Examples of 
unusual behaviors include traffic trav- 
eling in the wrong direction, people 
fighting, people in inappropriate loca- 
tions (e.g., in secure areas of pris- 
ons) etc. Models of normal behav- 
ior can be created using supervised 
classification, semi-supervised learning 
or unsupervised behavior modeling 
methods. See also anomaly detection. 
[XG08] 


eigenspace: Algorithms 
for the incremental updating of 
eigenspace representations. These 
algorithms facilitate approaches such 
as active learning. [CMW+97] 


upsampling: Increasing the sampling 


rate of a signal by creating more sam- 
ples. For example, doubling the size 
of an image requires a 2:1 upsampling. 
Upsampling may require interpolating 
the existing samples. [Sze10:3.5.1] 


USB camera: A camera conforming to 


the universal serial bus (USB) standard. 
[WP:Webcam] 


validation: Testing whether or not some 
hypothesis is true. See also hypothesize 
and verify. [WP:Validation] 


validation set: Compares the perfor- 
mance of different models which have 
been trained on a training set, in 
order to carry out model selection. The 
selected model is then used to make 
predictions on the test set. If model 
selection is carried out on the test set, 
this is likely to produce a downward- 
biased estimate of the true general- 
ization error. See also cross-validation. 
[HTF08:7.2] 


valley: A dark elongated object in a gray 
scale image, so called because it corre- 
sponds to a valley in the image viewed 
as a 3D surface or elevation map of 
intensity versus image position. [GP93] 


valley detection: An image processing 
operator that enhances linear features 
rather than light-to-dark edges. See also 
bar detector. [LLSV99] 


value quantization: When a continuous 
number is encoded as a finite number 
of integer values. A common example 
of this occurs when a voltage or cur- 
rent is encoded as integers in the range 
0-255. [Kam89] 


vanishing line: The 2D line that is the 
image of the intersection of a 3D plane 
with the plane at infinity. The horizon 
line in an image is the intersection of 
the ground plane with the plane at 
infinity, just as a pair of railway lines 
meeting in a vanishing point is the 
intersection of two parallel lines and 
the plane at infinity. The figure shows 
the vanishing line for the ground plane 
with a road and railroad: [HZ00:7.6] 
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vanishing point: The image of the point 
at infinity where two parallel 3D lines 
meet. A pair of parallel 3D lines are 
represented as & + Añ% and b + An. The 
vanishing point is the image of the 3D 


direction ( ), The figure shows the 


n 
0 


vanishing points for a road and rail- 
road: [TV98:6.2.3] 


VANISHING POINT 
VANISHING LINE ,/ 


variable focus: 1) A camera system with 
a lens that allows zoom to be changed 
under user or program control. 
2) An image sequence in which focal 
length varies through the sequence. 
[KH04] 


variance: The variance, denoted o°, of a 
random variable X is the expectation 
value of the square of the deviation of 
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the variable from the mean. If u 
is the mean, then o? = E[CX — wW’). 
[BisOG6:1.2.2] 

variational analysis: An extension to 


the calculus of variations to problems 
in optimization theory. [RW05] 


variational approach: Signal process- 
ing expressed as a problem of vari- 
ational calculus. The input signal is 
a function J@ on the interval t € 
[—1, 1]. The processed signal is a func- 
tion P defined on the same interval that 
minimizes an energy functional E(P) 
of the form 


1 
E(P)= f SPO, PO, Idt. 
—1 


The calculus of variations shows that 
the minimizing P is the solution to the 
associated Euler-Lagrange equation 


af _ d of 
dP dtapP 


In computer vision, the functional is 
often of the form 


E= f truth(P, D + à x beauty(P) 


where the “truth” term measures 
fidelity to the data and the “beauty” 
term is a regularizer. These can be 
seen in a specific example: smoothing. 
In the conventional approach, smooth- 
ing might be considered the result of 
an algorithm, e.g., convolve the image 
with a Gaussian kernel. In the varia- 
tional approach, the smoothed signal 
P is the signal that best trades off 
smoothness, measured as the square of 
the second derivative [(P())?dt, and 
fidelity to the data, measured as the 
squared difference between the input 
and the output [f (PŒ) — I1@)*d?t, with 
the balance chosen by a parameter A: 


E(P) = I PO) — IM + APO dt 


[TBAB98] 


variational calculus: Synonym for 
calculus of variations, a field of math- 
ematics which deals with optimizing 
(maximizing or minimizing) function- 
als. [Fox88] 
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variational mean field method: A 


form of variational model in which the 
target probability distribution is fac- 
tored into a product of terms which 
are assumed to be independent. This 
allows optimization of one term while 
keeping the other marginal terms fixed 
at their “mean field” value. [Bis06:10.1] 


variational method: See variational 


approach. 


variational model: A family of problem- 


solving methods that involve formu- 
lating the problem numerically as set 
of constraints on the solution which 
are integrated over the whole problem 
space. Numerical minimization meth- 
ods are used to find a solution. Exam- 
ples include estimating optical flow, 
the fundamental matrix and image 
segmentation. [MS94] 


variational problem: See variational 


approach. 


vector field: A multivalued function f : 


R” > R”. The figure shows the 2D-to- 
2D function f(x, y) = Q, sin x): 
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An RGB image I, y = r, y), 
gx, y), bŒ, y)) is an example of a 2D- 
to-3D vector field. [WP:Vector_field] 


vector quantization: Representation of 
a set of vectors by associating each pos- 
sible vector with one of a small set 
of “codebook” vectors. For example, 
each pixel in an RGB image has 256° 
possible values, but one might expect 
that a particular image uses only a small 
subset of these values. If a 256-element 
colormap is computed, and each RGB 


value is represented by the nearest 
RGB vector in the colormap, the RGB 
space has been quantized into 256 ele- 
ments. Clustering of the data is often 
used to identify the best subsets for 
each element. [GG92:Ch. ] 


vehicle detection: An example of the 
object-recognition problem where the 
task is to find vehicles in video imagery. 
[SBM06] 


vehicle license (number) plate analy- 
sis: A visual system locates the license 
plate in a video image and then recog- 
nizes the characters. [HC04] 


vehicle tracking: An example of the 
tracking problem applied to images of 
vehicles. [FP03:17.5.1] 


velocity: Rate of change of position. Gen- 
erally, fora curve x(t) € R” the velocity 
is the 1-vector BO [BB82:3.6.1] 


velocity field: The image velocity of 
each point in an image. See also optical 
flow field. [Sch89:Ch. 5] 


velocity moment: A moment that inte- 
grates information about region veloc- 
ity as well as position and shape dis- 
tribution. Let mj) be the pqth central 
moment of a binary region in the 
ith image. Then the Cartesian veloc- 
ity moments are defined as Upgrs = 
Gi — Xi- i — Ji- D mi, where 
(%;, Vi) is the center of mass in the 7th 
image. [SNO1b] 


velocity smoothness constraint: 
Changes in the magnitude or direction 
of an image’s velocity field occur 
smoothly. [Hil84] 


vergence: 1) The angle between the 
optical axes in a stereo system, when 
the two cameras fixate on the same 
scene point. 
2) The difference between the 
pan angle settings of two cameras. 
[FP03:11.2] 


vergence maintenance: The action of a 
control loop that ensures that the opti- 
cal centers of two cameras - whose 
positions are under program control - 
are looking at the same scene point. 
[OP89] 


verification: In the context of object 
recognition, a class of algorithms aim- 


ing to test the validity of various 
hypotheses (models) explaining the 
data. Back projection is such a tech- 
nique, typically used with geometric 
models. See also object verification. 
[JKS95:15.1] 


vertex: A point at the end of a line 
(edge) segment. Often vertices are 
common to two or more line segments. 
[Low91:10.2] 


video: 1) Generic term for a set of images 
taken at successive instants with small 
time intervals between them. 
2) The analog signal emitted by a video 
camera. Each frame of video corre- 
sponds to about 40 ms of electrical sig- 
nal that encodes the start of each scan 
line, the image of each video scan line, 
and synchronization information. 
3) A video recording. [Umb98:1.4] 


video analysis: A general term for 
extracting information from a video, 
which includes extracting measure- 
ments related to the objects (such as 
sports players) being observed in the 
video, video clip categorization, cut 
detection, video key frame detection 
etc. [WP:Video_content_analysis] 


video annotation: The association of 
symbolic objects, such as text descrip- 
tions or index terms, with frames of 
video. [FML04] 


video camera: A camera that records 
a sequence of images over time. 
[FP03:Ch. 1] 


video clip categorization: Determining 
which category the content of a video 
belongs to, such as sports video, action 
video, outdoor scenery etc. [BCC08] 


video coding: The conversion of video 
to a digital bitstream. The source 
may be analog or digital. Generally, 
coding also compresses or reduces the 
bitrate of the video data. [WP:Data_ 
compression#Video] 


video compression: Video coding with 
the specific aim of reducing the 
number of bits required to repre- 
sent a video sequence. Examples 
include MPEG, H.263 and DIVX. 
[WP:Data_compression#Video] 


video content analysis: A collection 
of approaches to extracting different 
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video deinterlacing: 


types of information from videos, such 
as the main characters, sports behav- 
ior quantization, types of scene, types 
of technique used, appearance of logos 
etc. [WP:Video_content_analysis] 


video corpus: A collection or body 


(Latin corpus) of videos or video clips, 
typically to be analyzed, e.g., to see 
the frequency of some occurrence. 
The videos may be subject to video 
annotation. [AQ08] 


Traditional 
broadcast video transmitted each 
frame as two video fields, consisting 
of the odd lines of the frame and then 
the even lines. Deinterlacing is the 
process of combining the two fields 
back to a single image, which can 
then be the subject of progressive 
image transmission. One difficulty 
that can arise is if the two fields were 
captured at slightly different times, 
thus creating pixel jitter on moving 
objects. [Sze10:8.4.3] 


video descriptor: 1) A set of properties 


that summarize the content of a video, 
such as a color histogram, statistics on 
the number of scenes or the amount of 
motion observed etc. 

2) A specialized code for the format of 
the video, e.g., 720p which stands for 
a 1280 x 720, 60 Hz progressive scan 
video stream. [WP:Extended_display_ 
identification_data] 


video deshearing: Shearing can occur 


in an image or video when the camera 


BEFORE 


AFTER 
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(or scene) moves while progressively 
capturing a video. The result is “tilted” 
structure in the image and perpendic- 
ular scene structures are no longer 
observed as perpendicular. Deshearing 
removes this shear. The figure shows 
an image before and after deshearing: 
[Sze10:8.2] 


video error concealment: Compressed 


video is typically transmitted in blocks, 
which can be affected by the loss 
of a block. The loss of data will 
affect the reconstruction of the video 
frame, which can lead to blank sec- 
tions in a video frame. Error con- 
cealment attempts to reconstruct the 
missing data, e.g., by using data from 
the previous or succeeding frame. As 
the repairs are only estimated, this 
can lead to blocking artifacts or glitch 
artifacts. Video error concealment 
attempts both to replace the missing 
data and correct any introduced arti- 
facts. [WP:Packet_loss_concealment] 


video indexing: Video annotation with 


the aim of allowing queries of the form 
“At what frame did event x occur?” or 
“Does object x appear?”. [SZ94] 


video key frame: 1) In video 


compression, a frame which is 
stored completely, and which is then 
used as the basis for incremental 
compression. New key frames are 
needed when there is a large change 
in the viewpoint, a significant scene 
change, etc. 

2) In video analysis, key frames are 
detected under the same criteria as 
for compression; they can be used for 
understanding the action or for video 
summarization. 

3) In computer animation, key frames 
define the start and end of a smooth 
transformation which can be interpo- 
lated. [FP03:p. 311] 


video mining: A general term used for 


extracting information from a video, 
which could include detection of 
suspicious objects (e.g., abandoned 
objects) or inspection of manufactured 
objects. A more advanced use is to dis- 
cover patterns in a video or collection 
of videos, such as analysis of traffic 
patterns to enhance safety, statistical 


analysis of sports performances, iden- 
tifying the cast of a video, quantifying 
customer behavior in a store etc. 
[RDD03] 


video mosaic: 1) A collection of video 
key frames which creates a video 
summarization of the action in the 
video. 
2) A collage of videos (or still images) 
used to create a video in the same 
way that individual images can be 
combined to create a static image 
mosaic, usually for artistic or enter- 
tainment effect. [WP:Photographic_ 
mosaic] 


video quality assessment: When video 
data is transformed from one encoding 
to another or video compression is 
applied, there is the possibility that 
the quality is reduced. Quality assess- 
ment measures the amount of degra- 
dation; it may have both a subjec- 
tive and an objective component. 
[Win05] 


video rate system: A realtime 


video segmentation: Application of 


segmentation to video, with the 
requirement that the segmentation 
exhibit the temporal coherence of the 
original footage and to split the video 
sequence into different groups of con- 
secutive frames, e.g., when there is a 
change of scene. [BW98] 


video semantic content analysis: 


Analysis that gives a description of the 
content of a video in terms of actors, 
objects, relationships or activities, in 
contrast to more numerical quantities 
such as amount of movement, number 
of frames or color histograms. [BLJS07] 


video sequence: See video. [WP:Video] 


video sequence synchronization: The 


frame-by-frame registration of two (or 
more) video sequences, or of matching 
components of the content observed 
in them. Applications include visual 
surveillance from two viewpoints and 
wide baseline stereo. If the sequences 
have a different sample rate then 
dynamic time warping can be used. 


processing system that operates 
at the frame rate of the ambient video 
standard. Typically 25 or 30 frames 
per second, 50 or 60 fields per second. 
[KYO+96] a 


video restoration: Application of image 
restoration to video, often making use 
of the temporal coherence of video, or 
correcting for video-specific degrada- 
tions. [ZYZH10] 


video retrieval: The selection of a video 
from a database of videos. Retrieval 
could be based on associated key- 
words, text or metadata, or by similar 
video content (general interest, sports, 
or even a specific team or player). 
[EKO+04] 


video screening: 1) a public presenta- 
tion of a video. 
2) Reviewing the content of a video, 
e.g., for the detection of offensive 
material. 


video search: 1) Discovering and col- 
lecting links to videos across the web. 
2) Locating videos inside a database of 
videos (See video retrieval). 
3) Looking for specific content within 
a video. [WP:Video_search_engine] 


[TG04] 


video stabilization: A set of methods 


compensating for the observed jit- 
ter blur in a video caused by minor 
motions of the camera, particularly if it 
is handheld. The primary methods are 
to slightly move the lens, the imaging 
chip or the digitized image to reduce 
motion. [Sze10:8.2.1] 


video stream: The sequence of individ- 


ual images captured by a video camera. 
The stream may be raw or compressed 
images. 


video structure parsing: Exploiting the 


known a priori structure and proper- 
ties of a video to classify segmented 
shots. The a priori structure is found 
in video streams with a distinct sequen- 
tial or hierarchical structure, where 
major scenes can be predicted and 
detected, e.g., television news or 
weather reports. [LTZ96] 


video summarization: This produces 


a short image-based summary of the 
important events of a video, e.g., by 
extracting the video key frames with- 
out repetition while maintaining their 
order. An example of this is a short 
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video of sporting match highlights. 
[CF02] 


video texture: A potentially infinitely 
variable video constructed by ran- 
domly concatenating segments of a real 
video at frames where the connection 
is not noticeable. This is used by com- 
puter graphics to generate dynamic 
non-repeating image motion. [SSSE00] 


video thumbnail: A small image 
(thumbnail that gives an impression 
of the content of a video. There are 
many software packages to do this. See 
Google’s Videos page for examples. 


video transmission format: A descrip- 
tion of the precise form of the 
analog video signal coding conven- 
tions in terms of duration of com- 
ponents such as number of lines, 
number of pixels, front porch, 
sync and blanking. [WP:Moving_ 
image_formats#Transmission] 


vidicon: A type of tube camera, suc- 
cessor of the image orthicon tube. 
[Sch89:Ch. 8] 


view-based object recognition: Recog- 
nition of 3D objects using multiple 2D 
images of the objects rather than a 3D 
model. [BSB+96] 


view combination: A class of tech- 
niques combining prototype views 
linearly to form appearance models. 
See also eigenspace-based recognition, 
prototype, representation. [U1198] 


view-dependent reconstruction: The 
process of creating a new image or 
video from an arbitrary viewpoint, 
given two or more original images 
or videos, usually using a combina- 
tion of projective geometry and stereo 
fusion techniques. A typical applica- 
tion allows a viewer to change view- 
point while watching or replaying a 
sporting event. [CTMS03] 


view integration: Creation of a com- 
posite image (or video) by mosaicing 
several overlapping images (or video 
streams). 


view interpolation: A family of tech- 
niques for synthesizing a new view 
of an object or scene based on two 
or more previously captured views. 
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Techniques for doing this include sim- 
ple interpolation using feature point 
correspondence between images and 
projective transfer using the trifocal 
tensor. [CW93] 


view-invariant action recognition: 
Many action recognition techniques 
are based on models created from data 
from a given viewpoint, which limits 
their use to video seen only from the 
same viewpoint. Viewpoint-invariant 
techniques construct models that can 
be used to recognize actions indepen- 
dent of the viewpoint. [GM11] 


view selection: 
selection. 
2) Some object recognition or action 
recognition algorithms store multiple 
models of the object or action from 
different viewpoints. They must then 
select the view best suited for the cur- 
rent recognition task. [HZK08] 


1) See viewpoint 


view volume: The infinite volume of 
3D space bounded by the camera’s 
center of projection and the edges of 
the viewable area on the image plane. 
The volume might also be bounded 
near and far by other planes because of 
focusing and depth of field constraints. 
This figure illustrates the view volume: 
[JKS95:8.4] 


VIEW VOLUME 


ys 


CENTER OF PROJECTION 


viewer-centered representation: A 
representation of the 3D world that 
an observer (e.g., a robot or a 
human) maintains. The global coor- 
dinate system is maintained on the 
observer, and the representation of 
the world changes as the observer 
moves. Compare with object-centered 
representation. [TV98:10.6.2] 


viewfield: See field of view. 


viewing space: The set of all possi- 
ble locations from which an object or 


scene could be viewed. Typically these 
locations are grouped to give a set of 
typical or characteristic views of the 
object. If orthographic projection is 
used, then the full 3D space of views 
can be simplified to a viewsphere. 


it is important to choose a view 
that is highly informative, which is 
the goal of viewpoint selection. It 
is also important to choose a good 
set of source views for image-based 
rendering. [VFSHO1] 


[Bli82] 


viewpoint: The position and orienta- 


tion of the camera when an image 
was captured. The viewpoint may be 
expressed in absolute coordinates or 
relative to some arbitrary coordinate 
system, in which case the relative posi- 
tion of the camera and the scene (or 
other cameras) is the relevant quan- 
tity. [WP:Camera_angle#Angles_and_ 
their_effects] 


viewpoint consistency constraint: 


Lowe’s term for the concept that a 3D 
model matched to a set of 2D line seg- 
ments must admit at least one 3D cam- 
era position that projects the 3D model 
to those lines. Essentially, the 3D and 
2D data must allow pose estimation. 
[Low87b] 


viewpoint-dependent represen- 
tations: See viewer-centered 
representation. 


viewpoint invariance: A property that 


has the same value or a process that 
performs at the same level indepen- 
dent of the viewpoint from which 
the data was taken, e.g., projective 
invariants or face recognition using 3D 
models. [WCL+08] 


viewpoint planning: Deciding where 


an active vision system will look next, 
in order to maximize the likelihood of 
achieving some preset goal. A common 
example is computing the location of a 
range sensor in several successive posi- 
tions in order to gain a complete 3D 
model of a target object. After n pic- 
tures have been captured, the view- 
point planning problem is to choose 
the position of picture n+ 1 in order 
to maximize the amount of new data 
acquired, while ensuring that the new 
position will allow the new data to 
be registered to the n existing images. 
[MC97] 


viewpoint selection: When rendering 


a view or acquiring a 3D model, 


viewsphere: The set of camera positions 
from which an object can be observed. 
If the camera is orthographic, the view- 
sphere is parameterized by the 2D set 
of points on the 3D unit sphere. At 
the camera position corresponding to 
a particular point on the viewsphere, 
all images of the object caused by cam- 
era rotation are related by a 2D-to-2D 
image transformation, i.e., no parallax 
effects occur. See aspect graph. The fig- 
ure shows the placement of a camera 
on the viewsphere: [NG99] 


vignetting: Darkening of the corners of 
an image relative to the image cen- 
ter, which is related to the degree to 
which the points are off the optical 
axis. [Low91:3.11] 


virtual bronchoscopy: Creation of 
virtual views of the pulmonary 
system based on, e.g., magnetic 
resonance imaging as a replace- 
ment for endoscope imaging. 
[WP:Bronchoscopy] 


virtual endoscopy: Simulation of a tra- 
ditional endoscopy procedure using a 
virtual reality representation of physio- 
logical data such as that obtained by an 
X-ray CAT-scan or magnetic resonance 
imaging. [Vin96] 


virtual reality: The use of computer 
graphics and other interaction tools 
to confer on a user the sensation of 
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being in, and interacting with, an alter- 
native environment. This includes sim- 
ulation of visual, aural and haptic cues. 
The visual environment may be dis- 
played by rendering a 3D model of 
the world into a head-mounted display 
whose viewpoint is tracked in 3D so 
that the user’s head movements gen- 
erate images corresponding to their 
viewpoint. Alternatively, the user may 
be placed in a computer-augmented 
virtual environment (CAVE), where as 
much as possible of the user’s field 
of view can be manipulated by the 
controlling computer. [WP:Virtual_ 
reality] 


virtual view: Visualization of a model 
from a particular viewpoint. [MTG97] 


viscous model: A deformable model 
based on the concept of a viscous fluid 
(i.e., a fluid with a relatively high resis- 
tance to flow). [WP: Viscosity] 


viseme: A model of the visual appear- 
ance of the face and mouth movements 
that occur when uttering a phoneme. 
[WP:Viseme] 


visibility: Whether or not a particular 
feature is visible from a camera posi- 
tion. [WP:Visibility_(geometry)] 


visibility class: The set of points where 
exactly the same portion of an object 
or scene is visible. For example, when 
viewing the corner of a cube, an 
observer can move about in about 
one-eighth of the full viewing space 
before entering a new visibility class. 
[AMSO9] 


visibility locus: All camera positions 
from which a particular feature is visi- 
ble. [Goa87] 


visible light: Description of electro- 
magnetic radiation with wavelengths 
between about 400 nm (blue) and 
700 nm (red), corresponding to the 
range in which the rods and cones 
of the human eye are sensitive. 
[WP:Visible_spectrum] 


VISIONS: The early scene understanding 
system of Hanson and Riseman. [HR78] 


visual appearance: The observed 
appearance of an object, which 
depends on a number of factors, 
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such as the surface structure, orien- 
tation and reflectance of the object, 
the illumination, the spectral sensor 
sensitivity, the light reflected from 
the scene and the depth order- 
ing of other objects in the scene. 
[WP:Visual_appearance] 


visual attention: The process by which 
low-level feature detection directs 
high-level scene analysis and object 
recognition strategies. In humans, the 
results of the process are evident in 
the pattern of fixations and saccades 
in normal observation of the world. 
[WP: Attention] 


visual behavior: 1) An observed action, 
or the visually observable portion of an 
action, such as opening a door. 
2) The behavior shown when sensing, 
such as reorienting a camera. 
3) The biological processes and activi- 
ties that enable visual sensing. [IGM82] 


visual codeword: See codebook and bag 
of features. 


visual context: The knowledge of the 
visual environment encountered while 
doing some task that helps with the 
performance of the task. For example, 
when looking for a street sign in a 
new city, experience of this particu- 
lar context (situation) tells one to look 
at the corners of buildings, signs on 
posts at corners, or signs hanging from 
wires in the intersection. Contrast with 
a context-independent visual search 
algorithm that would look everywhere. 
[CJ98] 


visual cortex: A part of the brain dedi- 
cated to the processing of visual infor- 
mation. [WP:Visual_cortex] 


visual cue: An aspect of the informa- 
tion extracted from visual data that 
helps the observer better understand 
the scene. For example, an occluding 
contour helps us understand the depth 
order of objects and shading gives a 
clue to the shape of the surface (see 
shape from shading). [Jac02] 


visual event: When some aspect of 
visual behavior changes, such as enter- 
ing a new location, an agent perform- 
ing a different action or an interaction 
between two agents. [KSH05] 


visual hull: A space-carving method for 
approximating shape from multiple 
images. The method finds the silhou- 
ette contours of a given object in each 
image. The region of space defined 
by each camera and the associated 
image contour imposes a constraint on 
the shape of the target object. The 
visual hull is the intersection of all 


such constraints. As more views are 
taken, the approximation becomes bet- 
ter. See the shaded areas: [FP03:26.4] 


visual illusion: The perception of a 


scene, object or motion not corre- 
sponding to the world actually caus- 
ing the image or sequence. Illusions 
are caused, in general, by the combi- 
nation of special arrangements of the 
visual stimuli, viewing conditions and 
responses of the human vision sys- 
tem. Well-known examples include the 
Ames room (in which two people are 
seen as having very different heights 
in a seemingly normal room) and the 
Ponzo illusion (in which two equal 
segments seem to be different lengths 
when interpreted as 3D projections): 
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The well-known ambiguous figure- 
background drawings of Gestalt psy- 
chology, such as the chalice-faces pat- 
tern, are a related subject. [CS09:3.2] 


visual industrial inspection: The use 


of computer vision techniques in order 
to effect quality control or to con- 
trol processes in an industrial setting. 
[WP:Visual_inspection] 


visual inspection: A general term for 


analyzing a visual image to inspect 
some item, such as might be used for 
quality control on a production line. 
[WP:Visual_inspection] 


visual learning: The problem of learn- 


ing visual models from sets of images 
(examples). In general, knowledge that 
can be used to carry out vision tasks. An 
area of the vast field of automated learn- 
ing. Important applications employing 
visual learning include face recognition 
and image database indexing. See also 
unsupervised learning and supervised 
learning. 


visual localization: The problem of 


estimating the location of a target in 
space given one or more images of 
it. Solutions differ according to sev- 
eral factors including the number of 
input images (one in model-based pose 
estimation; multiple discrete images in 
stereo vision; or a video sequence in 
motion analysis), the a priori knowl- 
edge assumed (i.e., whether camera 
calibration, a full perspective or simpli- 
fied projection model, and a geomet- 
ric model of the target are available). 
[DM02] 


visual navigation: The problem of nav- 


igating (steering) a vehicle through 
an environment using visual data, typ- 
ically video sequences. It is possi- 
ble, under diverse assumptions, to 
determine the distance from obsta- 
cles, the time-to-contact, and the shape 
and identity of the objects in view. 
Both video and range sensors have 
been used, including acoustic sen- 
sors (see acoustic sonar). See also 
visual servoing and visual localization. 
[MS95] 


visual rhythm: 1) A visual summary of 


a complete video, e.g., by displaying 
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keyframes and shot transitions in a sin- 
gle display. 

2) A repeating visual pattern, such as an 
observed row of windows. [KLY+01] 


visual routine: Ullman’s 1984 term 
for a subcomponent of a visual sys- 
tem that performs a specific task, 
analogous to a behavior in robotics. 
[WP:Visual_routine] 


visual salience: A (numerical) assess- 
ment of the degree to which pixels or 
areas of a scene attract visual attention. 
The principle of Gestalt organization. 
[WP:Salience_(neuroscience)] 


visual search: The task of search- 
ing an image for a particular pre- 
specified object. Often used as an 
experimental tool in psychophysics. 
[WP:Visual_search] 


visual servoing: Using observed 
motions in the image as feedback 
for guiding a robot. For example, 
moving the robot to visually align 
the robot’s end effector with the 
desired target. Typically, the system 
has little or no a priori knowledge of 
the camera locations, their relation 
to the robot, or the robot kinematics. 
These parameters are learned as the 
robot moves. Visual servoing allows 
the calibration to change during robot 
operation. Such systems can adapt 
well to anomalous conditions, such as 
an arm bending under a load or motor 
slippage, or where calibration may not 
provide sufficient precision to allow 
the desired actions to be reliably pro- 
duced purely from the modeled robot 
kinematics and dynamics. Because 
only image measurements are avail- 
able, the inverse kinematic problem 
may be harder than in conventional 
servoing. [WP:Visual_Servoing] 


visual surveillance: The use of video 
cameras to observe situations usually 
involving people or vehicles. Applica- 
tions using visual surveillance include 
secure area monitoring, crowd flow 
analysis, consumer behavior summariz- 
ing, etc. [WP:Surveillance] 


visual tracking: Links points or objects 
from one video frame to the next dur- 
ing analysis of a video stream. A nor- 


312 


mal prerequisite is to have detected 
the item to be tracked in advance, but 
this can happen simultaneously, e.g., 
during mean-shift tracking. Tracking 
is often part of visual surveillance or 
behavior analysis. [FP03:Ch. 17] 


Viterbi algorithm: A dynamic program- 
ming algorithm used to determine the 
most probable explanation for the hid- 
den state sequence in a hidden Markov 
model (HMM) given the observed data. 
[Bis06: 13.2.5] 


vocabulary tree: An organization of 
an image database into a hierarchi- 
cal tree structure where the search 
for matching images compares the 
description of the index image to 
the descriptors at each branch of the 
tree. The tree is constructed from the 
database by hierarchical k-means clus- 
tering. [NSO6] 


volume: 1) A region of 3D space. A sub- 
set of R>. A (possibly infinite) 3D point 
set. 
2) The space bounded by a closed 
surface. [Nal93:9.2] 


volume detection: The detection of 
volume-shaped entities in 3D data sets, 
such as might be produced by a nuclear 
magnetic resonance scanner. [Bot87] 


volume matching: Identification of cor- 
respondence between objects or sub- 
sets of objects using a volumetric 
representation. [BS97] 


volume skeletons: The skeletons of 3D 
point sets, by extension of the defini- 
tions for 2D curves or regions. [SdB03] 


volumetric image: A voxmap or 3D 
array of points where each entry 
typically represents some measure 
of material density or other prop- 
erty in 3D space. Common examples 
include computed axial tomography 
and nuclear magnetic resonance data. 
[MM02] 


volumetric reconstruction: Any of 
several techniques that derive a 
volumetric representation from image 
data. Examples include X-ray CAT, 
space carving and visual hull compu- 
tation. [BM02:3.7.3] 


volumetric representation: A data 
structure by means of which a subset 


of 3D space is represented digitally. 
Examples include voxmap, octree and 
the space bounded by surface repre- 
sentations. [Nal93:9.2] 


volumetric scattering: A physical 
illumination and sensing effect where 
the sensing energy is redistributed 
by materials in the medium through 
which the radiation (e.g., light) travels, 
such as by dust, fog etc. Also known as 
volume scattering. [Max94] 


von Kries hypothesis: A hypothesis 
that the primate color constancy pro- 
cess scales each color channel multi- 
plicatively and independently. This can 
be implemented by multiplication with 
a diagonal matrix. [CGZ07] 


Voronoi cell: See Voronoi diagram. 


Voronoi diagram: Given n points Xim 
the Voronoi diagram of the point set 
is a partition of space into n regions 
or cells Ryn. Every point p in cell 
R; is closer to point x, than to any 
other X. The hyperplanes separating 
the Voronoi regions are the perpen- 
dicular bisectors of the edges in the 
Delaunay triangulation of the point set. 
In the figure, the Voronoi diagram of 
the four points are the four cells sur- 
rounding them: [SQ04:7.3.2] 


voxel: A region of 3D space, named 


from “volume element” by analogy 
with “pixel”. Usually a voxel is an axis- 
aligned rectangular solid or cube. A 
component of the voxmap represen- 
tation for a 3D volume. A voxel, like 
a pixel, may have associated attributes 
such as color, occupancy or density of 
some measurement. [FP03:21.3.3] 


voxel carving: See space carving. 


voxel coloring: A method for object 


or scene model recovery from multi- 
ple images. The core idea is to inter- 
sect observation rays from different 
viewpoints into a volumetric space 
and exploit the viewpoint consistency 
constraint, i.e., that true intersections 
should have the same color (the color 
of the surface in the voxel where the 
rays intersect). See also space carving. 
[SD97] 


voxmap: A volumetric representation 


that describes a 3D volume by divid- 
ing space into a regular grid of voxels, 
arranged as a 3D array v(i, j, k). For 
a Boolean voxmap, cell (i, j, K) inter- 
sects the volume iff v(i, j, R) = 1. The 
advantages of the representation are 
that it can represent an arbitrarily com- 
plex topology and is fast to look up. 
The major disadvantage is the large use 
of memory, addressed by the octree 
representation. [MPT05] 


VRML: Virtual Reality Markup Language. 


A means of defining 3D geometric 
models intended for Internet delivery. 
[WP:VRML] 
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Wachspress coordinates: A general- 
ization of Barycentric coordinates to 
represent points in the interior of a 
polygon as a weighted function of the 
polygon’s vertex positions. The gen- 
eralization allows polygons with an 
arbitrary number of vertices (instead 
of the three in Barycentric coordi- 
nates). Wachspress coordinates are 
useful for representing and manipulat- 
ing articulated objects and deformable 
shapes. [Wac75]. 


walkthrough: A classification of the infi- 
nite number of paths between two 
points into one of nine equivalence 
classes of the eight relative directions 
between the points plus the ninth hav- 
ing no movement. In the figure, point 
B is in equivalence class 2 relative to A: 
[BV98] 


Walsh function: The Walsh functions of 
order n are a particular set of square 
waves W(n,k): [0, 2”) > {—1, 1} for 
k from 1 to 2”. They are orthogonal 
and the product of Walsh functions is 
a Walsh function. The square waves 
transition only at integer lattice points 
so each function can be specified by 
the vector of values it takes on the 
points {5, 14, oe 3}. The collec- 
tion of these values for a given order n 
is the Hadamard transform matrix Hə» 


of order 2”. The two functions of order 
1 are the rows of 


1 1 
H = 
1 —1 


and the four of order 2 are 


SA Ai 
1 -1 =1 1 

In general, the functions of order n + 1 
are generated by the relation 


Hy» Hy» | 


Horn = 
Hə» —H 


and this recurrence is the basis of the 
fast Walsh transform. The figure shows 
the four Walsh functions of order 2: 
[Umb98:2.5.3] 


ov 


Walsh transform: Expression of a 2” 
element vector v in terms of a basis of 
order n Walsh functions; the multipli- 
cation by the corresponding Hadamard 
matrix. The Walsh transform has appli- 
cations in image coding, logic design 
and the study of genetic algorithms. 
[(Umb98:2.5.3] 


Waltz line labeling: A scheme for the 
interpretation of line images of poly- 
hedra in the blocks world. Each image 
line is labeled to indicate what class of 


Dictionary of Computer Vision and Image Processing, Second Edition. 
R. B. Fisher, T. P. Breckon, K. Dawson-Howe, A. Fitzgibbon, C. Robertson, E. Trucco and C. K. I. Williams. 
© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. 


scene edge gave rise to it: concave, 
convex, occluding, crack or shadow. 
By including the constraints supplied 
by junction labeling in a constraint 
satisfaction problem, Waltz demon- 
strated that collections of lines whose 
labels were locally ambiguous could 
be globally disambiguated. The figure 
shows a simple example of Waltz line 
labeling with concave edges (—), con- 
vex edges (+) and occluding edges 
(>): [BB82:9.5.3] 


warping: Transformation of an image by 
reparameterization of the 2D plane. 
Given an image J(%)_and a 2D-to- 
2D mapping w: xt x’, the warped 
image W(X) is [(w()). Warping func- 
tions w are often designed so that cer- 
tain control points Pin in the source 
image are mapped to specified loca- 
tions p; ,, in the destination image. 
See also image morphing. The figure 
shows an original image, I(x); a warp- 
ing function represented by arrows 
joining points x to w~!@); and the 
warped image W(x): [SB11:7.10] 


x 
Lye E 
=- 


-D 
i 


ee 
me 
we 
ve 
5 
SS 
~ 
~ni 
Sy 


ee 


preraars 


eee 
piri 
momen 


tars 


Vee 


kokea es 
Maam, 


aee, 


watermark: See digital watermarking. 
[WP:Watermark] 


watershed segmentation: Image 
segmentation by means of the 
watershed transform. A typical 
implementation proceeds thus: 
e Detect edges. 
e Compute the distance transform D 
of the edges. 
e Compute watershed regions in — D. 
The figure shows a) the original 
image; b) Canny edges; c) the dis- 
tance transform; d) region boundaries 
of the watershed transform of (©); © 
mean color in the watershed regions; 


f) regions overlaid on the image: 
[SB11:10.10] 


watershed transform: A tool for 


morphological image segmentation. 
The watershed transform views the 
image as an elevation map, with each 
local minimum in the map given a 
unique integer label. The watershed 
transform of the image assigns to each 
non-minimum pixel, p, the label of 
the minimum to which a drop of water 
would fall if placed at p. Points on 
“ridges” or watersheds of the elevation 
map that could fall into one of two min- 
ima are called watershed points and 
the set of pixels surrounding each min- 
imum that share its label are called 
watershed regions. Efficient algorithms 
exist for the computation of the water- 
shed transform. The figure shows an 
image with minima superimposed; the 
same image viewed as a 3D eleva- 
tion map; and the watershed trans- 
form of the image, where different min- 
ima have different colored regions and 
watershed pixels are shown in white; 
a particular watershed is indicated by 
arrows: [SB11:10.10] 
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wavelength: The distance between suc- 
cessive peaks of the wave. Denoted i, 
it is the wave’s speed divided by the fre- 
quency. Electromagnetic waves, par- 
ticularly visible light, are important in 
computer vision, with wavelengths of 
the order of 400-700 nm. [Hec87:2.2] 


wavelet: A function $(«) that has cer- 
tain properties that mean it can 
be used to derive a set of basis 
function representations in terms of 
which other functions can be approx- 
imated. Comparing to the Fourier 
transform basis functions, note that 
they can be viewed as a set of 
scalings and translations of f(x) = 
sinr x), e.g., cosamx) = sinGmx+ 
D= sE. Similarly, a wavelet 
basis is made from a mother wavelet 
d(x) by translating and scaling: each 
basis function ¢;(%) is of the form 
Pik = const - (24x — R). The con- 
ditions on ¢ ensure that different basis 
functions G.e., with different j and 
k) are orthonormal. There are several 
popular choices (e.g., by Haar and 
Daubechies) for ¢, that trade off var- 
ious desirable properties, such as com- 
pactness in space and time, and the 
ability to approximate certain classes 
of functions. The figure shows the 
mother Haar transform wavelet and 
some of the derived wavelets ¢; x: 


[Mal89] 
1-2 1-1 94,0 4,1 912 
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wavelet descriptor: Description of a 
shape in terms of the coefficients of a 
wavelet decomposition of the original 
signal, in a manner similar to Fourier 
shape descriptors for 2D curves. See 
also wavelet transform. [FP03:22.5.2] 


wavelet transform: Representation of a 
signal in terms of a basis of wavelets. 
Similar to the Fourier transform, but as 
the wavelet basis is a two-parameter 
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family of functions ¢;z, the wavelet 
transform of a d-D signal is a (d + 1)- 
D function. However, the number of 
distinct values needed to represent the 
transform of a discrete signal of length 
n is just O(n). The wavelet transform 
has similar applications to the Fourier 
transform, but the wavelet basis offers 
advantages when representing natural 
signals such as images. [Umb98:2.5.5] 


wavelet tree: In text indexing, a struc- 


ture that may be used to compress 
data and text. It works by converting 
a string into a balanced binary tree of 
bit vectors where 0 replaces half of the 
symbols and 1 replaces the other half. 
The tree is filtered and re-encoded so 
that the resulting ambiguity lessens and 
this sequence is repeated until no ambi- 
guity is left. [FGM06] 


weak calibration: The process of esti- 


mating the epipolar geometry, e.g., 
the fundamental matrix, from a set 
of feature point correspondences 
matched between a pair of images cap- 
tured from uncalibrated perspective 
cameras. [PGM96] 


weak learner: A learner that is guaran- 


teed (with high probability) to perform 
better than chance but with perfor- 
mance poorly aggregated against the 
ground truth for a given problem. 
Contrast with strong learner. Boosting 
is a method to combine many weak 
learners to produce a strong learner. 
[Bis06:14.3] 


weak perspective: An approxi- 
mation of viewing geometry 


between the pinhole camera or 
full perspective camera and the 
orthographic imaging model. The 
projection of a homogeneous 3D 
point X = (X, Y, Z, D! is given by the 
formula 


(*) = (7 P2 Dw fee: 
y Poa Pz Pz Pa 

for the affine camera but with the 
additional constraint that the vectors 


(i, Piz; P13) and Cai, P22, p23) are 


scaled rows of a rotation matrix, i.e., 


Pupa + Pi2zP22 + Piz P23 = O 
[TV98:2.2.4] 


weakly calibrated stereo: Any two- 


view stereo algorithm for which the 
only calibration information needed 
is the fundamental matrix between 
the cameras is said to be weakly 
calibrated. In the general, multi-view 
case, the camera calibration is known 
up to a projective ambiguity. Weakly 
calibrated systems cannot determine 
Euclidean properties such as absolute 
scale but return results that are pro- 
jectively equivalent to the Euclidean 
reconstructions. [FP03:10.1.5] 


weakly supervised learning: A gen- 


eral distinction in machine learning 
is between supervised learning and 
unsupervised learning. However, the 
supervisory information provided may 
be weak relative to the task at hand. 
For example in learning to carry out 
object localization, one may be given 
only the information that an image 
does or does not contain an instance 
of the specified object, but not its loca- 
tion. Another example of weak super- 
vision is multiple instance learning. 
[Mur12:1.5.4] 


wearable camera: A video camera 


that can be carried on a person’s 
body or head for the purpose of 
analyzing, recording or transmitting 
what that person sees. It can be used 
for military purposes or as part of an 
augmented reality system wherein a 
data projector annotates the observed 
environment with useful information. 
[WP:Wearable_computer] 


Weber’s Law: If a difference can be per- 


ceived between two stimuli of values 
I and J+ ôI then it should be possi- 
ble to perceive a difference between 
two stimuli with different values J and 
J +ôJ where u < a [Jai89:3.2] 


weighted least squares: A least mean 
square estimation process in which 
the data elements also have a weight 
associated. The weights might specify 
the confidence or quality of the data 
item. The use of weights can help 
make the estimation more robust. [WP: 
Weighted_least_squares¥Weighted_ 
least_squares] 


weighted walkthrough: A discrete 
measure of the relative position of two 


regions. The measure is a histogram of 
the walkthrough relative positions of 
every pair of points selected from the 
two regions. [BBV03] 


weld seam tracking: Using visual feed- 


back to control a robotic welding 
device, so it maintains the weld along 
the desired seam. [BLAO2] 


white balance: A system of color cor- 


rection to deal with differing light con- 
ditions, in order for white objects to 
appear white. [WP:Color_balance] 


white noise: A noise process in which 


the noise power at all frequencies is 
equal (as compared to pink noise). 
When considering spatially distributed 
noise, white noise means that there 
is distortion at every spatial frequency 
(.e., large distortions as well as small). 
[WP:White_noise] 


whitening filter: See noise-whitening 


filter. 


wide angle lens: A lens with a field 


of view greater than about 45°. Wide 
angle lenses allow more information 
to be collected in a single image, 
but often suffer a loss of resolution, 
particularly at the periphery of the 
image. Wide angle lenses are also more 
likely to require correction for nonlin- 
ear lens distortion. [WP:Wide_angle_ 
lens] 


wide-area scene analysis (WASA): 


Video-based surveillance over an area 
larger than a single camera’s view, 
using a network of cameras. [SCK+11] 


wide baseline stereo: The stereo 


correspondence problem when the 
two images for which correspondence 
is to be determined are significantly dif- 
ferent because the cameras are sepa- 
rated by a long baseline. In particular, 
a 2D window around a point in one 
image is expected to look significantly 
different in the second image because 
of foreshortening, occlusion, and light- 
ing effects. [MCUP04] 


wide field of view: Where the optics 


is designed to capture light rays form- 
ing large angles (60° or more) with the 
optical axis. See also wide angle lens, 
panoramic image mosaic, panoramic 
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image stereo and plenoptic function 
representation. [ZZX04] 


width function: Given a 2D shape 


(closed subset of the plane) S$ C R?, 
the width function w(@) is the width of 
the shape as a function of orientation. 
Specifically, the projection P(@):= 
{xcos@é + ysind | œ, y) € S$}, and 
w(@) := max P(@) — min P(@). [BN78] 


Wiener filter: A regularized inverse 


convolution filter. Given a signal g that 
is known to be the convolution of 
an unknown signal f and a known 
corrupting signal k, it is desired to 
undo the effect of k and recover f. 
If CF, G, K) are the respective Fourier 
transforms of (f,g,k), then G=F.- 
K, so the inverse filter can recover 
F = G + K. In practice, however, G 
is corrupted by noise, so that when 
an element of K is less than the aver- 
age noise level, the noise is amplified. 
Wiener’s filter combats this tendency 
by adding an estimate of the noise to 
the divisor. Because the divisor is com- 
plex, a real formulation is as follows: 


G GK* GK* 
K KK* |K} 


F= 


Adding the frequency domain filter 
noise estimate N, we obtain the 
Wiener reconstruction of F given G 
and K: 


GK* 
F = —— 
IK? +N 
[Jai89:8.3] 


window: See region of interest. 


window scanning: Separating an image 


into systematic, adjacent and possibly 
overlapping regions of interest in order 
to perform some form of operation. 
Commonly used for object detection 
and localization. 


windowing: Looking at a small portion 


of a signal or image through a “win- 
dow”. For example, given the vec- 
tor X = {x,,..-,Xto0}, one might look 
at the window of 11 values centered 
around 50, {x45,.55}. Often used in order 
to restrict some computation such as 
the Fourier transform to a small part 
of the image. In general, windowing is 
described by a windowing function, 
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which is multiplied by the signal to give 
the windowed signal. For example, a 
signal fŒ) : R” > R and windowing 
function w(0; X) are given, where o 
controls the scale or width of w. Then 
the windowed signal is 


where č is the center of the window. 
The figure shows the Bartlett (1 — BD, 
Hanning G + A cos ZBL, and Gaussian 


2/2 
(exp(— #5 ) windowing functions in 


2D: [SOSO00:7.1.3] 
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winged edge representation: A graph 


representation for polyhedra in which 
the nodes represent vertices, edges 
and faces. Faces point to bounding 
edge nodes, which point to vertices, 
which point back to connecting edges, 
which point to adjacent faces. The 
term “winged edge” comes from the 
fact that edges have four links that 
connect to the previous and successor 
edges around each of the two faces that 
contain the given edge: [JKS95:13.5.1] 


LINKED EDGES 
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winner-takes-all: A strategy in which 


only the best candidate (e.g., algorithm 
or solution) is chosen and any other 
is abandoned. Commonly found in the 
neural network and learning literature. 
[WP:Winner-take-all] 


wire frame representation: A repre- 


sentation of 3D geometry in terms of 
vertices and edges linking the vertices. 


It does not include descriptions of the 
surface between the edges and, in par- 
ticular, does not include information 
for hidden line removal. The figure 
shows a wire frame model of a cube: 
[BT88:Ch. 8] 


within-class scatter matrix: In a 


classification problem, the scatter 
matrix for class C; is defined as S; = 
L zec, & MA — m)' where my; is 
the mean of class 7. The within-class 
scatter matrix is obtained by summing 
the individual scatter matrices, and is 


used in the computation of the Fisher 
linear discriminant (FLD). See 
also between-class scatter matrix. 
[Bis06:4.1] 


world coordinates: A coordinate 


system useful for placing objects in a 
scene. Usually this is a 3D coordinate 
system with some arbitrarily placed 
origin (e.g., at a corner of a room). 
This contrasts with object-centered 
representations, viewer-centered 
representations or camera coordinates. 
[JKS95:1.4.2] 


Wyner—Ziv video coding: A type of 


lossy video encoding that has low com- 
putational complexity at the encoder 
(e.g., a low-power wireless mobile 
device) and higher computational 
complexity at the decoder. The WZ 
method uses side information, such as 
an estimate of the camera motion. [WP: 
Distributed_source_coding#Wyner. 
E2.80.93Ziv_bound] 
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X-ray: Electromagnetic radiation of the right is the XNOR of the two images 
shorter wavelengths than ultraviolet on the left: [SB11:3.2.2] 
light, i.e., less than about 4-40 nm. 
Very short X-rays are called gamma 


rays. Useful for medical imaging, 
because of their power to penetrate 
most materials, and for other areas, 
such as lithography. [Hec87:3.6.6] 


X-ray CAT/CT: Computed axial 
tomography or computer-assisted 
tomography. A technique for dense 
3D imaging of the interior of a material, 
particularly the human body. Char- 
acterized by use of an X-ray source 
and an imaging system that rotate 
around the object being scanned. 
[Nev82:10.3.4] 


XNOR operator: A combination of two 
binary images, A and B, where each 


pixel Œ, j) in A XNOR B is 0 if exactly : : 
one of ACi, j) and Bd, j) is 1. The 
output is the complement of the XOR 


operator. In the figure, the image on 


XOR operator: A combination of two 
binary images, A and B, where each 
pixel (í, j) in A XOR B is 1 if exactly 
one of AG, j) and BG, 7) is 1. The out- 
put is the complement of the XNOR 
operator. In the figure, the image on 
the right is the XOR of the two images 
on the left: [SB11:3.2.2] 
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YARF: Yet Another Road Follower. YCrCb: See YUV where U = Cr and V = 
An autonomous driving system from Cb. 


Carnegie Mellon University. [KT95] YIQ: Color space used in NTSC 


yaw: A 3D rotation representation com- television. Separates luminance (Y) 
ponent (along with pitch and roll) and two color signals: in-phase 
often used for cameras or moving (roughly orange/blue) and quadrature 
observers. The yaw component spec- (roughly purple/green). Conversion 
ifies a rotation about a vertical axis to to YIQ from RGB is by [Y, I, Q]' = 
give a side-to-side change in orienta- MIR, G, B]’ where: [BB82:2.2.5] 


tion. The figure shows the yaw rotation 


9 9 
direction: [JKS95:12.2.1] 0.293 0.596 or 


M = | 0.587 —0.275 —0.523 
A 0.114 —0.321 0.311 


YUV: A color representation system in 
YAW CD which each point is represented by 
DIRECTION luminance (Y) and two chrominance 
channels (U, which is red minus Y, and 
V which is blue minus Y). [WP:YUV] 
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Zernike moment: A rotationally 
invariant moment formed from the 
dot product of an image with one of 
the Zernike polynomials. The Zernike 
polynomial 


URO, 6) = Rp e” 


is defined in polar coordinates (p, ¢) 
on the plane, only within the unit disk. 
When projecting an image, data out- 
side the unit disk are generally ignored. 
The real and imaginary parts are called 
the even and odd polynomials respec- 
tively. The radial function RP) is 
given by 

Dr y tAn —D! 

1=0 (= =) 


2 


The Zernike polynomials have a history 
in optics, as basis functions for mod- 
eling nonlinear lens distortion. In the 
figure, the Column 1 shows the real 
and imaginary parts of e”® for m= 1. 
Columns 2-4 show the real and imagi- 
nary parts of Zernike polynomials U}, 
U4, and U?: [Jai89:9.8] 


arn Ge 4% 
wy Y Y 


E TEDDE 


zero-crossing operator: A class of 
feature detector that detects zero 
crossings in the second derivative 
(rather than maxima in the first deriva- 
tive). An advantage of finding zero 


crossings rather than maxima is that 
the edges always form closed curves, 
so that regions are clearly delin- 
eated. A disadvantage is that noise 
is enhanced, so the image must be 
carefully smoothed before the sec- 
ond derivative is computed. A com- 
mon kernel that combines smooth- 
ing and second derivative computation 
is the Laplacian of Gaussian. [FP03: 
8.3.1] 


zero crossing of the Laplacian of a 
Gaussian: See zero-crossing operator. 


zip code analysis: See postal code 
analysis. 


zoom: 1) A change the effective focal 
length of a camera in order to increase 
magnification of the center of the field 
of view. 
2) Refers to the current focal-length set- 
ting of a zoom lens. [Jai89:7.4] 


zoom lens: A lens that allows the effec- 
tive focal length (or “zoom”) to be 
varied after manufacture. Zoom lenses 
may be manipulated manually or elec- 
trically. [WP:Zoom_lens] 


Zucker-Hummel operator: A convo- 
lution kernel for surface detection 
in volumetric images. There is one 
3x3x3 kernel for each of the 
three derivatives. For example, if 
u(x, y, Z) is the volume image, w is 
computed as the convolution of the 
kernel D.G, j, R) = SG, Pc(R), where 
i, j,k € {1, 2, 3}, c = [-1, 0, 1] 


[aa] 
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a= 1/,/3 and b = 1/,/2. The kernels 
for a and ou are permutations of D, 
given by D,G, j, R) = Dj, k, i) and 


D,(i, j, R) = Dk, i, j). [BB82:3.3.3] 


Zuniga—Haralick operator: A corner 
detection operator that is based on 
the coefficients of a cubic polyno- 
mial approximating the local neighbor- 
hood. [ZH83] 
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